The python spreadsheet: Another look (Traxter DSL)
I apologize in advance for any ugly amateurism in this post. It's my first attempt at a domain specific language :-)
Yesterday I posted about using PyCells to write a spreadsheet in Python.
Sadly, I can't figure out the problem with my code, and the PyCells mailing list seems to be pretty much dead.
So, I started thinking... what other ways are to achieve my goal? And decided to go medieval on this problem.
By that I mean that I will do it the most traditional way possible... with a twist.
The traditional way is, of course, to write one or more of lexer/parser/interpreter/compiler for the formula language.
Mind you, I don't intend to do anything complete, much less Excel-compatible (see Excel formula parsers are hell in this same blog.
So, let's start with a toy language, supporting the following:
Assignment to a variable
Classic 4-op arithmetics.
Function calls
Cell ranges
That's enough for a toy spreadsheet, and it should be easy to extend.
Here's a description of the grammar for such a language, written using Aperiot 1:
# This is a simple language for arithmetic expressions numbers number operators plus "+" times "*" minus "-" div "/" equal "=" colon ":" comma "," semicolon ";" brackets lpar "(" rpar ")" identifiers label start LIST rules LIST -> ASSIGNMENT : "[$1]" | ASSIGNMENT semicolon LIST : "[$1]+$3" | ASSIGNMENT semicolon : "[$1]" ASSIGNMENT -> label equal EXPR : "($1,$3)" ARGLIST -> ARG comma ARGLIST : "[$1]+$3" | ARG : "[$1]" ARG -> RANGE : "$1" | EXPR : "$1" | label : "$1" EXPR -> TERM : "$1" | TERM plus EXPR : "(\'+\',$1,$3)" | TERM minus EXPR : "(\'-\',$1,$3)" TERM -> FACTOR : "$1" | FACTOR times TERM : "(\'*\',$1,$3)" | FACTOR div TERM : "(\'/\',$1,$3)" FACTOR -> number : "$1.val()" | lpar EXPR rpar : "(\'group\',$2)" | FUNCALL : "$1" | label : "$1" | minus FACTOR : "-$2" FUNCALL -> label lpar ARGLIST rpar : "(\'funcall\',$1,$3)" RANGE -> label colon label : "(\'range\',$1,$3)"
This transforms this:
A1=SUM(A1:A7)*2; A3=2+2;
Into this:
[(<aperiot.lexer.Identifier instance at 0xb7af10ac>, ('*', ('funcall', <aperiot.lexer.Identifier instance at 0xb7af142c>, [('range', <aperiot.lexer.Identifier instance at 0xb7af15cc>, <aperiot.lexer.Identifier instance at 0xb7af144c>)]), 2)), (<aperiot.lexer.Identifier instance at 0xb7b4c72c>, ('+', 2, 2))]
Which is sort of a tree with all the expressions in prefix notation in them.
Now, here is the twist: I will "compile" this tree into.... python code. So I can use eval to do the evaluation, just like in the original python spreadsheet recipe.
So this is sort of a preprocessor:
The user writes excel-like formulas.
The spreadsheet stores python code obtained through compilation.
The spreadsheet evals the python code.
Of course we have the same problem as usual: cell dependencies, which is the reason why I started playing with PyCells in the first place!
But... well, here's another trick: since I am compiling, I know whenever there is a variable referenced in the code. And I can remember them :-)
So, I can turn this:
A1=SUM(A1:A3)*2; A3=2+2;
Into this:
[['A1=SUM(a1,a2,a3)*2;', set(['a1', 'a3', 'a2'])], ['A3=2+2;', set([])]]
The "compiled" python code and a dependency set. And voila, this spreadsheet will propagate correctly.
Here's the compiler... in about 60 lines of python 2. And since the whole point of this language is to track dependencies... let's call it Traxter.
Of course, this is a toy right now. But it's a toy with potential!
from pprint import pprint
from aperiot.parsergen import build_parser
import aperiot
import cellutils
import sys
dependencies=set()
def addOp(*args):
return '+'.join([compile_token(a) for a in args])
def mulOp(*args):
return '*'.join([compile_token(a) for a in args])
def subOp(*args):
return '-'.join([compile_token(a) for a in args])
def divOp(*args):
return '/'.join([compile_token(a) for a in args])
def groupOp(*args):
return '(%s)'%compile_token(args[0])
def funcOp(*args):
return '%s(%s)'%(args[0].symbolic_name,
','.join([compile_token(a) for a in args[1]]))
def rangeOp(*args):
c1=args[0].symbolic_name
c2=args[1].symbolic_name
return ','.join([compile_token(a) for a in cellutils.cellrange(c1,c2)])
operators={'+':addOp,
'-':subOp,
'*':mulOp,
'/':divOp,
'group':groupOp,
'funcall':funcOp,
'range':rangeOp
}
def compile_token(token):
if isinstance (token,aperiot.lexer.Identifier):
v=token.symbolic_name.lower()
dependencies.add(v)
return v
if isinstance(token,list) or isinstance(token,tuple):
return apply(operators[token[0]],token[1:])
return str(token)
def compile_assignment(tokens):
target=tokens[0].symbolic_name
compiled=compile_token(tokens[1])
return '%s=%s;'%(target,compiled)
myparser = build_parser('traxter')
t='A1=SUM(A1:A7)*2;A3=2+2;'
assign_list=myparser.parse(t)
pprint (assign_list)
compiled=[]
for assignment in assign_list:
dependencies=set()
c=compile_assignment(assignment)
compiled.append([c,dependencies])
print compiled
- 1
-
You may be asking yourself:what the heck is Aperiot? Or Why the heck Aperiot? Well... I had never heard of it until 6 hours ago, and I just wrote a DSL using it. That means it's worth knowing.
- 2
-
cellrange() is left as an exercise for the reader because my current implementation is shameful ;-)