Ir al contenido principal

Ralsina.Me — El sitio web de Roberto Alsina

The python spreadsheet: Another look (Traxter DSL)

I apol­o­gize in ad­vance for any ug­ly am­a­teurism in this post. It's my first at­tempt at a do­main spe­cif­ic lan­guage :-)

Yes­ter­day I post­ed about us­ing Py­Cells to write a spread­sheet in Python.

Sad­ly, I can't fig­ure out the prob­lem with my code, and the Py­Cells mail­ing list seems to be pret­ty much dead.

So, I start­ed think­ing... what oth­er ways are to achieve my goal? And de­cid­ed to go me­dieval on this prob­lem.

By that I mean that I will do it the most tra­di­tion­al way pos­si­ble... with a twist.

The tra­di­tion­al way is, of course, to write one or more of lex­er/­parser/in­ter­preter/­com­pil­er for the for­mu­la lan­guage.

Mind you, I don't in­tend to do any­thing com­plete, much less Ex­cel-­com­pat­i­ble (see Ex­cel for­mu­la parsers are hell in this same blog.

So, let's start with a toy lan­guage, sup­port­ing the fol­low­ing:

  • As­sign­­ment to a var­i­able

  • Clas­sic 4-op arith­met­ic­s.

  • Func­­tion calls

  • Cell ranges

That's enough for a toy spread­sheet, and it should be easy to ex­tend.

Here's a de­scrip­tion of the gram­mar for such a lan­guage, writ­ten us­ing Ape­ri­ot 1:

# This is a simple language for arithmetic expressions

numbers
     number

operators
     plus   "+"
     times  "*"
     minus  "-"
     div    "/"
     equal  "="
     colon ":"
     comma ","
     semicolon ";"

brackets
     lpar  "("
     rpar  ")"


identifiers
     label

start
     LIST

rules

LIST             -> ASSIGNMENT                : "[$1]"
                  | ASSIGNMENT semicolon LIST : "[$1]+$3"
                  | ASSIGNMENT semicolon : "[$1]"

ASSIGNMENT       -> label equal EXPR : "($1,$3)"

ARGLIST          -> ARG comma ARGLIST : "[$1]+$3"
                  | ARG          : "[$1]"

ARG              -> RANGE       : "$1"
                  | EXPR        : "$1"
                  | label       : "$1"

EXPR             -> TERM              : "$1"
                  | TERM plus EXPR    : "(\'+\',$1,$3)"
                  | TERM minus EXPR   : "(\'-\',$1,$3)"

TERM             -> FACTOR               : "$1"
                  | FACTOR times TERM    : "(\'*\',$1,$3)"
                  | FACTOR div TERM      : "(\'/\',$1,$3)"


FACTOR           -> number           : "$1.val()"
                  | lpar EXPR rpar  : "(\'group\',$2)"
                  | FUNCALL     : "$1"
                  | label               : "$1"
                  | minus FACTOR    : "-$2"

FUNCALL          ->  label lpar ARGLIST rpar : "(\'funcall\',$1,$3)"

RANGE            -> label colon label   : "(\'range\',$1,$3)"

This trans­forms this:

A1=SUM(A1:A7)*2;
A3=2+2;

In­to this:

[(<aperiot.lexer.Identifier instance at 0xb7af10ac>,
  ('*',
   ('funcall',
    <aperiot.lexer.Identifier instance at 0xb7af142c>,
    [('range',
      <aperiot.lexer.Identifier instance at 0xb7af15cc>,
      <aperiot.lexer.Identifier instance at 0xb7af144c>)]),
   2)),
 (<aperiot.lexer.Identifier instance at 0xb7b4c72c>, ('+', 2, 2))]

Which is sort of a tree with all the ex­pres­sions in pre­fix no­ta­tion in them.

Now, here is the twist: I will "com­pile" this tree in­to.... python code. So I can use eval to do the eval­u­a­tion, just like in the orig­i­nal python spread­sheet recipe.

So this is sort of a pre­pro­ces­sor:

  • The us­er writes ex­cel-­­like for­­mu­las.

  • The spread­­sheet stores python code ob­­tained through com­pi­la­­tion.

  • The spread­­sheet evals the python code.

Of course we have the same prob­lem as usu­al: cell de­pen­den­cies, which is the rea­son why I start­ed play­ing with Py­Cells in the first place!

But... well, here's an­oth­er trick: since I am com­pil­ing, I know when­ev­er there is a vari­able ref­er­enced in the code. And I can re­mem­ber them :-)

So, I can turn this:

A1=SUM(A1:A3)*2;
A3=2+2;

In­to this:

[['A1=SUM(a1,a2,a3)*2;', set(['a1', 'a3', 'a2'])],
 ['A3=2+2;', set([])]]

The "com­piled" python code and a de­pen­den­cy set. And voila, this spread­sheet will prop­a­gate cor­rect­ly.

Here's the com­pil­er... in about 60 lines of python 2. And since the whole point of this lan­guage is to track de­pen­den­cies... let's call it Trax­ter.

Of course, this is a toy right now. But it's a toy with po­ten­tial!

from pprint import pprint
from aperiot.parsergen import build_parser
import aperiot
import cellutils
import sys

dependencies=set()

def addOp(*args):
        return '+'.join([compile_token(a) for a in args])
def mulOp(*args):
        return '*'.join([compile_token(a) for a in args])
def subOp(*args):
        return '-'.join([compile_token(a) for a in args])
def divOp(*args):
        return '/'.join([compile_token(a) for a in args])

def groupOp(*args):
        return '(%s)'%compile_token(args[0])

def funcOp(*args):
        return '%s(%s)'%(args[0].symbolic_name,
                         ','.join([compile_token(a) for a in args[1]]))

def rangeOp(*args):
        c1=args[0].symbolic_name
        c2=args[1].symbolic_name
        return ','.join([compile_token(a) for a in cellutils.cellrange(c1,c2)])

operators={'+':addOp,
           '-':subOp,
           '*':mulOp,
           '/':divOp,
           'group':groupOp,
           'funcall':funcOp,
           'range':rangeOp
           }


def compile_token(token):
        if isinstance (token,aperiot.lexer.Identifier):
                v=token.symbolic_name.lower()
                dependencies.add(v)
                return v
        if isinstance(token,list) or isinstance(token,tuple):
            return apply(operators[token[0]],token[1:])
        return str(token)

def compile_assignment(tokens):
        target=tokens[0].symbolic_name
        compiled=compile_token(tokens[1])
        return '%s=%s;'%(target,compiled)


myparser = build_parser('traxter')
t='A1=SUM(A1:A7)*2;A3=2+2;'
assign_list=myparser.parse(t)
pprint (assign_list)

compiled=[]
for assignment in assign_list:
        dependencies=set()
        c=compile_assignment(assignment)
        compiled.append([c,dependencies])

print compiled
1

You may be ask­ing your­self:what the heck is Ape­ri­ot? Or Why the heck Ape­ri­ot? Well... I had nev­er heard of it un­til 6 hours ago, and I just wrote a DSL us­ing it. That means it's worth know­ing.

2

cell­range() is left as an ex­er­cise for the read­er be­cause my cur­rent im­ple­men­ta­tion is shame­ful ;-)


Contents © 2000-2020 Roberto Alsina