New software project: Stupid Sheet

2007-05-24 18:02

Adding something else to my plate is probably not a very good idea, but what the heck, I can make it sleep another three years if I lose interest.

So: I am writing a real spreadsheet in python.

Probably never going to be useful for corporations, but it should be at least as featureful as Google's and it should be amazingly small.

Here are the components:

Traxter: my spreadsheet-formula-like-language with dependency tracking that compiles to python.
PyQt (hey, it has a grid widget)
Python (Of course)

The status right now:

It's almost as functional as it was 2.5 years ago
Except for broken relative cells.
But with the beginning of a real formula language.
With automatic recalculation and cyclic dependency checks.
Adding things is dead simple. Here's the implementation of SUM (not uploaded yet, though):

def sum(*args):
  ac=0
  for v in args:
    ac+=v
  return ac

All the range stuff happens behind the scenes (although you may get a function called with thousands of args... I wonder how well Python handles that).

You can check it at the google code project (Use the SVN).

The python spreadsheet: Another look (Traxter DSL)

2007-05-23 18:31

I apologize in advance for any ugly amateurism in this post. It's my first attempt at a domain specific language :-)

Yesterday I posted about using PyCells to write a spreadsheet in Python.

Sadly, I can't figure out the problem with my code, and the PyCells mailing list seems to be pretty much dead.

So, I started thinking... what other ways are to achieve my goal? And decided to go medieval on this problem.

By that I mean that I will do it the most traditional way possible... with a twist.

The traditional way is, of course, to write one or more of lexer/parser/interpreter/compiler for the formula language.

Mind you, I don't intend to do anything complete, much less Excel-compatible (see Excel formula parsers are hell in this same blog.

So, let's start with a toy language, supporting the following:

Assignment to a variable
Classic 4-op arithmetics.
Function calls
Cell ranges

That's enough for a toy spreadsheet, and it should be easy to extend.

Here's a description of the grammar for such a language, written using Aperiot [1]:

# This is a simple language for arithmetic expressions

numbers
     number

operators
     plus   "+"
     times  "*"
     minus  "-"
     div    "/"
     equal  "="
     colon ":"
     comma ","
     semicolon ";"

brackets
     lpar  "("
     rpar  ")"


identifiers
     label

start
     LIST

rules

LIST             -> ASSIGNMENT                : "[$1]"
                  | ASSIGNMENT semicolon LIST : "[$1]+$3"
                  | ASSIGNMENT semicolon : "[$1]"

ASSIGNMENT       -> label equal EXPR : "($1,$3)"

ARGLIST          -> ARG comma ARGLIST : "[$1]+$3"
                  | ARG          : "[$1]"

ARG              -> RANGE       : "$1"
                  | EXPR        : "$1"
                  | label       : "$1"

EXPR             -> TERM              : "$1"
                  | TERM plus EXPR    : "(\'+\',$1,$3)"
                  | TERM minus EXPR   : "(\'-\',$1,$3)"

TERM             -> FACTOR               : "$1"
                  | FACTOR times TERM    : "(\'*\',$1,$3)"
                  | FACTOR div TERM      : "(\'/\',$1,$3)"


FACTOR           -> number           : "$1.val()"
                  | lpar EXPR rpar  : "(\'group\',$2)"
                  | FUNCALL     : "$1"
                  | label               : "$1"
                  | minus FACTOR    : "-$2"

FUNCALL          ->  label lpar ARGLIST rpar : "(\'funcall\',$1,$3)"

RANGE            -> label colon label   : "(\'range\',$1,$3)"

This transforms this:

A1=SUM(A1:A7)*2;
A3=2+2;

Into this:

[(<aperiot.lexer.Identifier instance at 0xb7af10ac>,
  ('*',
   ('funcall',
    <aperiot.lexer.Identifier instance at 0xb7af142c>,
    [('range',
      <aperiot.lexer.Identifier instance at 0xb7af15cc>,
      <aperiot.lexer.Identifier instance at 0xb7af144c>)]),
   2)),
 (<aperiot.lexer.Identifier instance at 0xb7b4c72c>, ('+', 2, 2))]

Which is sort of a tree with all the expressions in prefix notation in them.

Now, here is the twist: I will "compile" this tree into.... python code. So I can use eval to do the evaluation, just like in the original python spreadsheet recipe.

So this is sort of a preprocessor:

The user writes excel-like formulas.
The spreadsheet stores python code obtained through compilation.
The spreadsheet evals the python code.

Of course we have the same problem as usual: cell dependencies, which is the reason why I started playing with PyCells in the first place!

But... well, here's another trick: since I am compiling, I know whenever there is a variable referenced in the code. And I can remember them :-)

So, I can turn this:

A1=SUM(A1:A3)*2;
A3=2+2;

Into this:

[['A1=SUM(a1,a2,a3)*2;', set(['a1', 'a3', 'a2'])],
 ['A3=2+2;', set([])]]

The "compiled" python code and a dependency set. And voila, this spreadsheet will propagate correctly.

Here's the compiler... in about 60 lines of python [2]. And since the whole point of this language is to track dependencies... let's call it Traxter.

Of course, this is a toy right now. But it's a toy with potential!

from pprint import pprint
from aperiot.parsergen import build_parser
import aperiot
import cellutils
import sys

dependencies=set()

def addOp(*args):
        return '+'.join([compile_token(a) for a in args])
def mulOp(*args):
        return '*'.join([compile_token(a) for a in args])
def subOp(*args):
        return '-'.join([compile_token(a) for a in args])
def divOp(*args):
        return '/'.join([compile_token(a) for a in args])

def groupOp(*args):
        return '(%s)'%compile_token(args[0])

def funcOp(*args):
        return '%s(%s)'%(args[0].symbolic_name,
                         ','.join([compile_token(a) for a in args[1]]))

def rangeOp(*args):
        c1=args[0].symbolic_name
        c2=args[1].symbolic_name
        return ','.join([compile_token(a) for a in cellutils.cellrange(c1,c2)])

operators={'+':addOp,
           '-':subOp,
           '*':mulOp,
           '/':divOp,
           'group':groupOp,
           'funcall':funcOp,
           'range':rangeOp
           }


def compile_token(token):
        if isinstance (token,aperiot.lexer.Identifier):
                v=token.symbolic_name.lower()
                dependencies.add(v)
                return v
        if isinstance(token,list) or isinstance(token,tuple):
            return apply(operators[token[0]],token[1:])
        return str(token)

def compile_assignment(tokens):
        target=tokens[0].symbolic_name
        compiled=compile_token(tokens[1])
        return '%s=%s;'%(target,compiled)


myparser = build_parser('traxter')
t='A1=SUM(A1:A7)*2;A3=2+2;'
assign_list=myparser.parse(t)
pprint (assign_list)

compiled=[]
for assignment in assign_list:
        dependencies=set()
        c=compile_assignment(assignment)
        compiled.append([c,dependencies])

print compiled

PyCells: The Python SpreadSheet redux

2007-05-22 10:18

In 2004 I saw a recipe about how to make a "spreadsheet" in python in 10 lines of code:

class SpreadSheet:
    _cells = {}
    tools = {}
    def __setitem__(self, key, formula):
        self._cells[key] = formula
    def getformula(self, key):
        return self._cells[key]
    def __getitem__(self, key ):
        return eval(self._cells[key], SpreadSheet.tools, self)

It's shocking. And it works, too:

>>> from math import sin, pi
>>> SpreadSheet.tools.update(sin=sin, pi=pi, len=len)
>>> ss = SpreadSheet()
>>> ss['a1'] = '5'
>>> ss['a2'] = 'a1*6'
>>> ss['a3'] = 'a2*7'
>>> ss['a3']
210
>>> ss['b1'] = 'sin(pi/4)'
>>> ss['b1']
0.70710678118654746
>>> ss.getformula('b1')
'sin(pi/4)'

I was so awed, I wrote a PyQt version . Of course there is a catch in that code: it sucks if you are trying to write a spreadsheet with it.

Why? Because it doesn't store results, but only formulas.

For example:

A1=2
A2=A1*2

If you ask for the value of A2, you get 4. If you set A1 to 7, what's the value of A2?

Well, it's nothing yet, because it's only calculated when you ask for it. But suppose you are trying to display that sheet... you need to know A2's value changed when you set A1!

That's cell dependencies, and while that simple code handles them in a way, it totally sucks in another.

So, I went ahead and coded around it successfully. Of course the code was not so pretty anymore (although a large part of the uglyness is just for making it work with Python 2.3 and relative cells).

Then yesterday, while looking at the excel formula parser madness I saw a reference to PyCells, a python port of Cells from CLOS.

Here is a blog commenting on Pycells:

It basically takes the concept of a cell in a spreadsheet that get updated automatically to programming where there are a lot of internal data states that are dependent on one another in a chain, or a complex graph of dependencies. Like, the color of a button depends on whether you selected a radio button or not. Or, shut down the motor if the sensor reads above 100 degrees (example given in text).

Almost everyone uses that analogy... however, no matter how hard I looked, I couldn't find anyone who had actually tried writing a spreadsheet using PyCells! Not even as an example!

So here it is:

import cells

class Cell(cells.Model):
    formula=cells.makecell(value='')
    @cells.fun2cell()
    def value(self,prev):
        print "eval ",self.formula
        return eval(self.formula, {}, self.ss)
    def __init__(self, ss, *args, **kwargs):
        self.ss=ss
        cells.Model.__init__(self, *args, **kwargs)


class ssDict:
        def __init__(self):
                self.ss={}

        def __getitem__(self,key):
                return self.ss[key].value

        def __setitem__(self,key,v):
                if not self.ss.has_key(key):
                        c=Cell(self)
                        c.formula=v
                        self.ss[key]=c
                else:
                        self.ss[key].formula=v

if __name__ == "__main__":
        ss=ssDict()
        ss['a1'] ='5'

        ss['a2']='2*a1'
        print "a1: ", ss['a1']
        print "a2: ", ss['a2']

        ss['a1'] = '7'
        print "a1: ", ss['a1']
        print "a2: ", ss['a2']

And here you can see it running:

[ralsina@monty cells]$ python ctest.py
a1:  eval  5
5
a2:  eval  2*a1
10
eval  7
eval  2*a1
a1:  7
a2:  14

See how when I set a1 to 7 I get "eval 7" and "eval 2*a1"? That's because it's propagating changes the right way. And that's why this would work as a basis for a spreadsheet.

UPDATE

It seems there is a bug wither in PyCells, or in my example, or something, because it breaks pretty easily, if the dependency chain is even two cells:

a1:  eval  5
5
a2:  eval  2*a1
10
a3:  eval  2*a2
20
eval  7
eval  2*a1
a1:  7
a2:  14
a3:  20

In this example, I am setting A3 to 2*A2, and when I update A1, A3 is not updated. Further research is needed.

Ralsina.Me — Roberto Alsina's website

Posts about StupidSheet (old posts, page 2)

New software project: Stupid Sheet

The python spreadsheet: Another look (Traxter DSL)

PyCells: The Python SpreadSheet redux