I apologize in advance for any ugly amateurism in this post. It's my first attempt at a domain specific language :-)
Yesterday I posted about using PyCells to write a spreadsheet in Python.
Sadly, I can't figure out the problem with my code, and the PyCells mailing list seems to be pretty much dead.
So, I started thinking... what other ways are to achieve my goal? And decided to go medieval on this problem.
By that I mean that I will do it the most traditional way possible... with a twist.
The traditional way is, of course, to write one or more of lexer/parser/interpreter/compiler for the formula language.
Mind you, I don't intend to do anything complete, much less Excel-compatible (see Excel formula parsers are hell in this same blog.
So, let's start with a toy language, supporting the following:
That's enough for a toy spreadsheet, and it should be easy to extend.
Here's a description of the grammar for such a language, written using Aperiot :
# This is a simple language for arithmetic expressions
numbers
number
operators
plus "+"
times "*"
minus "-"
div "/"
equal "="
colon ":"
comma ","
semicolon ";"
brackets
lpar "("
rpar ")"
identifiers
label
start
LIST
rules
LIST -> ASSIGNMENT : "[$1]"
| ASSIGNMENT semicolon LIST : "[$1]+$3"
| ASSIGNMENT semicolon : "[$1]"
ASSIGNMENT -> label equal EXPR : "($1,$3)"
ARGLIST -> ARG comma ARGLIST : "[$1]+$3"
| ARG : "[$1]"
ARG -> RANGE : "$1"
| EXPR : "$1"
| label : "$1"
EXPR -> TERM : "$1"
| TERM plus EXPR : "(\'+\',$1,$3)"
| TERM minus EXPR : "(\'-\',$1,$3)"
TERM -> FACTOR : "$1"
| FACTOR times TERM : "(\'*\',$1,$3)"
| FACTOR div TERM : "(\'/\',$1,$3)"
FACTOR -> number : "$1.val()"
| lpar EXPR rpar : "(\'group\',$2)"
| FUNCALL : "$1"
| label : "$1"
| minus FACTOR : "-$2"
FUNCALL -> label lpar ARGLIST rpar : "(\'funcall\',$1,$3)"
RANGE -> label colon label : "(\'range\',$1,$3)"
This transforms this:
A1=SUM(A1:A7)*2;
A3=2+2;
Into this:
[(<aperiot.lexer.Identifier instance at 0xb7af10ac>,
('*',
('funcall',
<aperiot.lexer.Identifier instance at 0xb7af142c>,
[('range',
<aperiot.lexer.Identifier instance at 0xb7af15cc>,
<aperiot.lexer.Identifier instance at 0xb7af144c>)]),
2)),
(<aperiot.lexer.Identifier instance at 0xb7b4c72c>, ('+', 2, 2))]
Which is sort of a tree with all the expressions in prefix notation in them.
Now, here is the twist: I will "compile" this tree into.... python code. So I can use eval to do the evaluation, just like in the original python spreadsheet recipe.
So this is sort of a preprocessor:
The user writes excel-like formulas.
The spreadsheet stores python code obtained through compilation.
The spreadsheet evals the python code.
Of course we have the same problem as usual: cell dependencies, which is the reason why I started playing with PyCells in the first place!
But... well, here's another trick: since I am compiling, I know whenever there is a variable referenced in the code. And I can remember them :-)
So, I can turn this:
A1=SUM(A1:A3)*2;
A3=2+2;
Into this:
[['A1=SUM(a1,a2,a3)*2;', set(['a1', 'a3', 'a2'])],
['A3=2+2;', set([])]]
The "compiled" python code and a dependency set. And voila, this spreadsheet will propagate correctly.
Here's the compiler... in about 60 lines of python . And since the whole point of this language is to track dependencies... let's call it Traxter.
Of course, this is a toy right now. But it's a toy with potential!
from pprint import pprint
from aperiot.parsergen import build_parser
import aperiot
import cellutils
import sys
dependencies=set()
def addOp(*args):
return '+'.join([compile_token(a) for a in args])
def mulOp(*args):
return '*'.join([compile_token(a) for a in args])
def subOp(*args):
return '-'.join([compile_token(a) for a in args])
def divOp(*args):
return '/'.join([compile_token(a) for a in args])
def groupOp(*args):
return '(%s)'%compile_token(args[0])
def funcOp(*args):
return '%s(%s)'%(args[0].symbolic_name,
','.join([compile_token(a) for a in args[1]]))
def rangeOp(*args):
c1=args[0].symbolic_name
c2=args[1].symbolic_name
return ','.join([compile_token(a) for a in cellutils.cellrange(c1,c2)])
operators={'+':addOp,
'-':subOp,
'*':mulOp,
'/':divOp,
'group':groupOp,
'funcall':funcOp,
'range':rangeOp
}
def compile_token(token):
if isinstance (token,aperiot.lexer.Identifier):
v=token.symbolic_name.lower()
dependencies.add(v)
return v
if isinstance(token,list) or isinstance(token,tuple):
return apply(operators[token[0]],token[1:])
return str(token)
def compile_assignment(tokens):
target=tokens[0].symbolic_name
compiled=compile_token(tokens[1])
return '%s=%s;'%(target,compiled)
myparser = build_parser('traxter')
t='A1=SUM(A1:A7)*2;A3=2+2;'
assign_list=myparser.parse(t)
pprint (assign_list)
compiled=[]
for assignment in assign_list:
dependencies=set()
c=compile_assignment(assignment)
compiled.append([c,dependencies])
print compiled