A graph is a graph is a graph.

2007-05-25 19:06

After hacking for about two hours the cell dependencies yesterday using dicts, I found myself saying "how can I check if the dependency graph is cyclical?"

And of course it hit me. That's the dependency graph. Use a graph library!

I looked for about 2 minutes before finding one that's probably not optimal, but is damn simple and in the public domain: graph_lib.py by Nathan Denny.

First interesting data point, the first two lines on the file (yes, I found out there is a later version):

#--Version 1.0.0
#--Nathan Denny, May 27, 1999

Yup. In two days this piece of code is turning 8 years old and untouched. But it works just fine and dandy!

Here's the piece of code that makes the engine run which has grown from a humble 10 LOC to almost a whooping 40 LOC:

class SpreadSheet(QtCore.QObject):
    _cells = {}
    tools = {}
    graph=Graph()

    def __init__(self,parent):
        QtCore.QObject.__init__(self,parent)
        for name in dir(functions):
                self.tools[name]=eval('functions.'+name)


    def __setitem__(self, key, formula):
        key=key.lower()
        c=traxcompile('%s=%s;'%(key,formula))
        self._cells[key] = [c[key][0],False,compile(c[key][0],"Formula for %s"%key,'eval')]

        # Dependency graph
        if not self.graph.has_node(key):
                self.graph.add_node(key)
        for edge in self.graph.in_arcs(key):
                self.graph.delete_edge(edge)
        for cell in c[key][1]:
                self.graph.add_edge(cell,key)
        try:
                print 'GRAPH(TOPO): ',self.graph.topological_sort()
                self._cells[key][1]=False
                print 'GRAPH(BFS) : ',self.graph.bfs(key)
                for cell in self.graph.bfs(key)[1:]:
                        self.emit(QtCore.SIGNAL('changed'),(cell))
        except Graph_topological_error:
                # We made the graph cyclic
                # So, mark this cell as evil
                self._cells[key][1]=True
                # And remove all incoming edges to go back to
                # status quo
                for edge in self.graph.in_arcs(key):
                        self.graph.delete_edge(edge)

    def getformula(self, key):
        key=key.lower()
        return self._cells[key][0]
    def __getitem__(self, key ):
        print self._cells[key]
        if self._cells[key][1]:
                return "ERROR: cyclic dependency"
        else:
                print 'evaluating [%s]: '%key,type(self._cells[key][0]),self._cells[key][0]
              return eval(self._cells[key][0], self.tools, self)

This engine supports:

User defined functions (just add them in formula.py)
Cell dependencies with detection of cicles, of any length
Unlimited size spreadsheets
Notification of cell changes to other modules
Automatic recalculation
A custom formula language (ok, that's not in this piece of code ;-)

The whole project is now 1167 LOC, of which 591 are generated or graph_lib, which means everything including the traxter "compiler" and the UI is less than 600 lines of code.

Not too shabby.

My goal is a functional spreadsheet with a working GUI and supporting the most common functions and pieces of the formula language in .... 2000 LOC, and I am willing to work on it for about two weeks.

Let's see how it turns out. Of course anyone with 5 free minutes can contribute his favourite spreadsheet function (I already have sum ;-)

New software project: Stupid Sheet

2007-05-24 18:02

Adding something else to my plate is probably not a very good idea, but what the heck, I can make it sleep another three years if I lose interest.

So: I am writing a real spreadsheet in python.

Probably never going to be useful for corporations, but it should be at least as featureful as Google's and it should be amazingly small.

Here are the components:

Traxter: my spreadsheet-formula-like-language with dependency tracking that compiles to python.
PyQt (hey, it has a grid widget)
Python (Of course)

The status right now:

It's almost as functional as it was 2.5 years ago
Except for broken relative cells.
But with the beginning of a real formula language.
With automatic recalculation and cyclic dependency checks.
Adding things is dead simple. Here's the implementation of SUM (not uploaded yet, though):

def sum(*args):
  ac=0
  for v in args:
    ac+=v
  return ac

All the range stuff happens behind the scenes (although you may get a function called with thousands of args... I wonder how well Python handles that).

You can check it at the google code project (Use the SVN).

The python spreadsheet: Another look (Traxter DSL)

2007-05-23 18:31

I apologize in advance for any ugly amateurism in this post. It's my first attempt at a domain specific language :-)

Yesterday I posted about using PyCells to write a spreadsheet in Python.

Sadly, I can't figure out the problem with my code, and the PyCells mailing list seems to be pretty much dead.

So, I started thinking... what other ways are to achieve my goal? And decided to go medieval on this problem.

By that I mean that I will do it the most traditional way possible... with a twist.

The traditional way is, of course, to write one or more of lexer/parser/interpreter/compiler for the formula language.

Mind you, I don't intend to do anything complete, much less Excel-compatible (see Excel formula parsers are hell in this same blog.

So, let's start with a toy language, supporting the following:

Assignment to a variable
Classic 4-op arithmetics.
Function calls
Cell ranges

That's enough for a toy spreadsheet, and it should be easy to extend.

Here's a description of the grammar for such a language, written using Aperiot [1]:

# This is a simple language for arithmetic expressions

numbers
     number

operators
     plus   "+"
     times  "*"
     minus  "-"
     div    "/"
     equal  "="
     colon ":"
     comma ","
     semicolon ";"

brackets
     lpar  "("
     rpar  ")"


identifiers
     label

start
     LIST

rules

LIST             -> ASSIGNMENT                : "[$1]"
                  | ASSIGNMENT semicolon LIST : "[$1]+$3"
                  | ASSIGNMENT semicolon : "[$1]"

ASSIGNMENT       -> label equal EXPR : "($1,$3)"

ARGLIST          -> ARG comma ARGLIST : "[$1]+$3"
                  | ARG          : "[$1]"

ARG              -> RANGE       : "$1"
                  | EXPR        : "$1"
                  | label       : "$1"

EXPR             -> TERM              : "$1"
                  | TERM plus EXPR    : "(\'+\',$1,$3)"
                  | TERM minus EXPR   : "(\'-\',$1,$3)"

TERM             -> FACTOR               : "$1"
                  | FACTOR times TERM    : "(\'*\',$1,$3)"
                  | FACTOR div TERM      : "(\'/\',$1,$3)"


FACTOR           -> number           : "$1.val()"
                  | lpar EXPR rpar  : "(\'group\',$2)"
                  | FUNCALL     : "$1"
                  | label               : "$1"
                  | minus FACTOR    : "-$2"

FUNCALL          ->  label lpar ARGLIST rpar : "(\'funcall\',$1,$3)"

RANGE            -> label colon label   : "(\'range\',$1,$3)"

This transforms this:

A1=SUM(A1:A7)*2;
A3=2+2;

Into this:

[(<aperiot.lexer.Identifier instance at 0xb7af10ac>,
  ('*',
   ('funcall',
    <aperiot.lexer.Identifier instance at 0xb7af142c>,
    [('range',
      <aperiot.lexer.Identifier instance at 0xb7af15cc>,
      <aperiot.lexer.Identifier instance at 0xb7af144c>)]),
   2)),
 (<aperiot.lexer.Identifier instance at 0xb7b4c72c>, ('+', 2, 2))]

Which is sort of a tree with all the expressions in prefix notation in them.

Now, here is the twist: I will "compile" this tree into.... python code. So I can use eval to do the evaluation, just like in the original python spreadsheet recipe.

So this is sort of a preprocessor:

The user writes excel-like formulas.
The spreadsheet stores python code obtained through compilation.
The spreadsheet evals the python code.

Of course we have the same problem as usual: cell dependencies, which is the reason why I started playing with PyCells in the first place!

But... well, here's another trick: since I am compiling, I know whenever there is a variable referenced in the code. And I can remember them :-)

So, I can turn this:

A1=SUM(A1:A3)*2;
A3=2+2;

Into this:

[['A1=SUM(a1,a2,a3)*2;', set(['a1', 'a3', 'a2'])],
 ['A3=2+2;', set([])]]

The "compiled" python code and a dependency set. And voila, this spreadsheet will propagate correctly.

Here's the compiler... in about 60 lines of python [2]. And since the whole point of this language is to track dependencies... let's call it Traxter.

Of course, this is a toy right now. But it's a toy with potential!

from pprint import pprint
from aperiot.parsergen import build_parser
import aperiot
import cellutils
import sys

dependencies=set()

def addOp(*args):
        return '+'.join([compile_token(a) for a in args])
def mulOp(*args):
        return '*'.join([compile_token(a) for a in args])
def subOp(*args):
        return '-'.join([compile_token(a) for a in args])
def divOp(*args):
        return '/'.join([compile_token(a) for a in args])

def groupOp(*args):
        return '(%s)'%compile_token(args[0])

def funcOp(*args):
        return '%s(%s)'%(args[0].symbolic_name,
                         ','.join([compile_token(a) for a in args[1]]))

def rangeOp(*args):
        c1=args[0].symbolic_name
        c2=args[1].symbolic_name
        return ','.join([compile_token(a) for a in cellutils.cellrange(c1,c2)])

operators={'+':addOp,
           '-':subOp,
           '*':mulOp,
           '/':divOp,
           'group':groupOp,
           'funcall':funcOp,
           'range':rangeOp
           }


def compile_token(token):
        if isinstance (token,aperiot.lexer.Identifier):
                v=token.symbolic_name.lower()
                dependencies.add(v)
                return v
        if isinstance(token,list) or isinstance(token,tuple):
            return apply(operators[token[0]],token[1:])
        return str(token)

def compile_assignment(tokens):
        target=tokens[0].symbolic_name
        compiled=compile_token(tokens[1])
        return '%s=%s;'%(target,compiled)


myparser = build_parser('traxter')
t='A1=SUM(A1:A7)*2;A3=2+2;'
assign_list=myparser.parse(t)
pprint (assign_list)

compiled=[]
for assignment in assign_list:
        dependencies=set()
        c=compile_assignment(assignment)
        compiled.append([c,dependencies])

print compiled

PyCells: The Python SpreadSheet redux

2007-05-22 10:18

In 2004 I saw a recipe about how to make a "spreadsheet" in python in 10 lines of code:

class SpreadSheet:
    _cells = {}
    tools = {}
    def __setitem__(self, key, formula):
        self._cells[key] = formula
    def getformula(self, key):
        return self._cells[key]
    def __getitem__(self, key ):
        return eval(self._cells[key], SpreadSheet.tools, self)

It's shocking. And it works, too:

>>> from math import sin, pi
>>> SpreadSheet.tools.update(sin=sin, pi=pi, len=len)
>>> ss = SpreadSheet()
>>> ss['a1'] = '5'
>>> ss['a2'] = 'a1*6'
>>> ss['a3'] = 'a2*7'
>>> ss['a3']
210
>>> ss['b1'] = 'sin(pi/4)'
>>> ss['b1']
0.70710678118654746
>>> ss.getformula('b1')
'sin(pi/4)'

I was so awed, I wrote a PyQt version . Of course there is a catch in that code: it sucks if you are trying to write a spreadsheet with it.

Why? Because it doesn't store results, but only formulas.

For example:

A1=2
A2=A1*2

If you ask for the value of A2, you get 4. If you set A1 to 7, what's the value of A2?

Well, it's nothing yet, because it's only calculated when you ask for it. But suppose you are trying to display that sheet... you need to know A2's value changed when you set A1!

That's cell dependencies, and while that simple code handles them in a way, it totally sucks in another.

So, I went ahead and coded around it successfully. Of course the code was not so pretty anymore (although a large part of the uglyness is just for making it work with Python 2.3 and relative cells).

Then yesterday, while looking at the excel formula parser madness I saw a reference to PyCells, a python port of Cells from CLOS.

Here is a blog commenting on Pycells:

It basically takes the concept of a cell in a spreadsheet that get updated automatically to programming where there are a lot of internal data states that are dependent on one another in a chain, or a complex graph of dependencies. Like, the color of a button depends on whether you selected a radio button or not. Or, shut down the motor if the sensor reads above 100 degrees (example given in text).

Almost everyone uses that analogy... however, no matter how hard I looked, I couldn't find anyone who had actually tried writing a spreadsheet using PyCells! Not even as an example!

So here it is:

import cells

class Cell(cells.Model):
    formula=cells.makecell(value='')
    @cells.fun2cell()
    def value(self,prev):
        print "eval ",self.formula
        return eval(self.formula, {}, self.ss)
    def __init__(self, ss, *args, **kwargs):
        self.ss=ss
        cells.Model.__init__(self, *args, **kwargs)


class ssDict:
        def __init__(self):
                self.ss={}

        def __getitem__(self,key):
                return self.ss[key].value

        def __setitem__(self,key,v):
                if not self.ss.has_key(key):
                        c=Cell(self)
                        c.formula=v
                        self.ss[key]=c
                else:
                        self.ss[key].formula=v

if __name__ == "__main__":
        ss=ssDict()
        ss['a1'] ='5'

        ss['a2']='2*a1'
        print "a1: ", ss['a1']
        print "a2: ", ss['a2']

        ss['a1'] = '7'
        print "a1: ", ss['a1']
        print "a2: ", ss['a2']

And here you can see it running:

[ralsina@monty cells]$ python ctest.py
a1:  eval  5
5
a2:  eval  2*a1
10
eval  7
eval  2*a1
a1:  7
a2:  14

See how when I set a1 to 7 I get "eval 7" and "eval 2*a1"? That's because it's propagating changes the right way. And that's why this would work as a basis for a spreadsheet.

UPDATE

It seems there is a bug wither in PyCells, or in my example, or something, because it breaks pretty easily, if the dependency chain is even two cells:

a1:  eval  5
5
a2:  eval  2*a1
10
a3:  eval  2*a2
20
eval  7
eval  2*a1
a1:  7
a2:  14
a3:  20

In this example, I am setting A3 to 2*A2, and when I update A1, A3 is not updated. Further research is needed.

Excel formula parsers are hell

2007-05-21 20:57

On 2004 I wrote a spreadsheet in python, which was about a 25KB download (compressed). It was pretty functional!.

The main problem that would never let that program be a real application was that it used python as a formula language instead of the traditional Lotus/Excel thing.

Yesterday I decided to look at it again, and see if there was a way around it [1]. My main asset was that I know a fair bit more about parsers now than I did then, and how hard could it be?

Well, it turns out that language is a whole lot harder than I knew!

I found this page about a parser

Here's an example from it:

=IF(R[39]C[11]>65,R[25]C[42],ROUND((R[11]C[11]*IF(OR(AND(R[39]C[11]>=55,
R[40]C[11]>=20),AND(R[40]C[11]>=20,R11C3="YES")),R[44]C[11],R[43]C[11]))+
(R[14]C[11] *IF(OR(AND(R[39]C[11]>=55,R[40]C[11]>=20),AND(R[40]C[11]>=20,
R11C3="YES")), R[45]C[11],R[43]C[11])),0))

Yes, that's a valid Excel formula. No, I have no idea what it does.

And here's some extra nuggets about the language (quoted from the same page):

For some reason Excel uses a comma as the range union operator. This means that its only unambiguous use between ranges is outside of a function call or within a subexpression (i.e., between parentheses).
For some other reason Excel uses a space (or multiple spaces) as the range intersection operator. While not as ambiguous as the comma, it does require some consideration.

And it goes on, and on, and on....

But hey, he wrote a nice parser!

Just for kicks, here's the best BNF I could find.

I really don't envy the KSpread authors if they try to be compatible to this garbage!

But why is this language so gnarly?

Because it has been growing organically for 30 years?
Because of backwards compatibility?
Because it was never designed?
All of the above?

I mean, it can't be intentional! Noone actually meant it to be like this, right?

Ralsina.Me — Roberto Alsina's website

Posts about python (old posts, page 24)