The python spreadsheet: Another look (Traxter DSL)
I apologize in advance for any ugly amateurism in this post. It's my first attempt at a domain specific language :-)
Yesterday I posted about using PyCells to write a spreadsheet in Python.
Sadly, I can't figure out the problem with my code, and the PyCells mailing list seems to be pretty much dead.
So, I started thinking... what other ways are to achieve my goal? And decided to go medieval on this problem.
By that I mean that I will do it the most traditional way possible... with a twist.
The traditional way is, of course, to write one or more of lexer/parser/interpreter/compiler for the formula language.
Mind you, I don't intend to do anything complete, much less Excel-compatible (see Excel formula parsers are hell in this same blog.
So, let's start with a toy language, supporting the following:
Assignment to a variable
Classic 4-op arithmetics.
Function calls
Cell ranges
That's enough for a toy spreadsheet, and it should be easy to extend.
Here's a description of the grammar for such a language, written using Aperiot [1]:
# This is a simple language for arithmetic expressions numbers number operators plus "+" times "*" minus "-" div "/" equal "=" colon ":" comma "," semicolon ";" brackets lpar "(" rpar ")" identifiers label start LIST rules LIST -> ASSIGNMENT : "[$1]" | ASSIGNMENT semicolon LIST : "[$1]+$3" | ASSIGNMENT semicolon : "[$1]" ASSIGNMENT -> label equal EXPR : "($1,$3)" ARGLIST -> ARG comma ARGLIST : "[$1]+$3" | ARG : "[$1]" ARG -> RANGE : "$1" | EXPR : "$1" | label : "$1" EXPR -> TERM : "$1" | TERM plus EXPR : "(\'+\',$1,$3)" | TERM minus EXPR : "(\'-\',$1,$3)" TERM -> FACTOR : "$1" | FACTOR times TERM : "(\'*\',$1,$3)" | FACTOR div TERM : "(\'/\',$1,$3)" FACTOR -> number : "$1.val()" | lpar EXPR rpar : "(\'group\',$2)" | FUNCALL : "$1" | label : "$1" | minus FACTOR : "-$2" FUNCALL -> label lpar ARGLIST rpar : "(\'funcall\',$1,$3)" RANGE -> label colon label : "(\'range\',$1,$3)"
This transforms this:
A1=SUM(A1:A7)*2; A3=2+2;
Into this:
[(<aperiot.lexer.Identifier instance at 0xb7af10ac>, ('*', ('funcall', <aperiot.lexer.Identifier instance at 0xb7af142c>, [('range', <aperiot.lexer.Identifier instance at 0xb7af15cc>, <aperiot.lexer.Identifier instance at 0xb7af144c>)]), 2)), (<aperiot.lexer.Identifier instance at 0xb7b4c72c>, ('+', 2, 2))]
Which is sort of a tree with all the expressions in prefix notation in them.
Now, here is the twist: I will "compile" this tree into.... python code. So I can use eval to do the evaluation, just like in the original python spreadsheet recipe.
So this is sort of a preprocessor:
The user writes excel-like formulas.
The spreadsheet stores python code obtained through compilation.
The spreadsheet evals the python code.
Of course we have the same problem as usual: cell dependencies, which is the reason why I started playing with PyCells in the first place!
But... well, here's another trick: since I am compiling, I know whenever there is a variable referenced in the code. And I can remember them :-)
So, I can turn this:
A1=SUM(A1:A3)*2; A3=2+2;
Into this:
[['A1=SUM(a1,a2,a3)*2;', set(['a1', 'a3', 'a2'])], ['A3=2+2;', set([])]]
The "compiled" python code and a dependency set. And voila, this spreadsheet will propagate correctly.
Here's the compiler... in about 60 lines of python [2]. And since the whole point of this language is to track dependencies... let's call it Traxter.
Of course, this is a toy right now. But it's a toy with potential!
from pprint import pprint from aperiot.parsergen import build_parser import aperiot import cellutils import sys dependencies=set() def addOp(*args): return '+'.join([compile_token(a) for a in args]) def mulOp(*args): return '*'.join([compile_token(a) for a in args]) def subOp(*args): return '-'.join([compile_token(a) for a in args]) def divOp(*args): return '/'.join([compile_token(a) for a in args]) def groupOp(*args): return '(%s)'%compile_token(args[0]) def funcOp(*args): return '%s(%s)'%(args[0].symbolic_name, ','.join([compile_token(a) for a in args[1]])) def rangeOp(*args): c1=args[0].symbolic_name c2=args[1].symbolic_name return ','.join([compile_token(a) for a in cellutils.cellrange(c1,c2)]) operators={'+':addOp, '-':subOp, '*':mulOp, '/':divOp, 'group':groupOp, 'funcall':funcOp, 'range':rangeOp } def compile_token(token): if isinstance (token,aperiot.lexer.Identifier): v=token.symbolic_name.lower() dependencies.add(v) return v if isinstance(token,list) or isinstance(token,tuple): return apply(operators[token[0]],token[1:]) return str(token) def compile_assignment(tokens): target=tokens[0].symbolic_name compiled=compile_token(tokens[1]) return '%s=%s;'%(target,compiled) myparser = build_parser('traxter') t='A1=SUM(A1:A7)*2;A3=2+2;' assign_list=myparser.parse(t) pprint (assign_list) compiled=[] for assignment in assign_list: dependencies=set() c=compile_assignment(assignment) compiled.append([c,dependencies]) print compiled