Posts about StupidSheet

Coming soon...

I have managed to create the most bizarre way to write a spreadsheet engine in Python.

I still need to polish some things, but here are the highlights:

  • Your formulas compile to C
  • C is inlined using Instant

Yes, that means you edit a cell in the GUI and you need to wait until gcc compiles the thing.

Is it going to be useful? Probably not. Is it cool? I say yeah.

I will polish it somewhat, create a sttandalone engine, and show it here.

Sometimes, you need to do it the hard way.

You may have noticed no posts about StupidSheet for about a week.

Well, I ran into the limitations of the formula parser I was doing using Aperiot. I just couldn't make it parse this:

A1=IF(A2=B2,1,0)

So, I spent the next week trying one parsing package for Python a day until I could find one I understood and could make it parse that example.

I must say it was educational.

So, now the parser is based on PLY which is pretty much Lex+YACC with a (slightly more) pythonic syntax, and it works.

Yes, it's a bit harder, but by trying to do things simply I was limiting myself too much, and, perhaps underestimating myself.

I am a pretty smart guy, there is no reason I can't understand these things.

Almost a real spreadsheet! (with video)

I was able to hack a bit at StupidSheet this morning, and there are some real advances.

In fact, it's a barely functional spreadsheet already! Hell, it has at least one feature OOcalc lacks, and one both OOcalc and KSpread missed.

Check it out in this video (9.5MB):

Sorry, dead file

Sorry if the audio is out of sync and/or too low. And you probably can't stream it, you need to download it first. And maybe it will 404 on you. If that's the case, wait a few minutes and insist.

Stupid Sheet: Redoing cell displacements

For my spreadsheet project, I had to redo something I had forgotten about: cell displacement. I did that once when the formula language was python.

At the time, I parsed the python using the tokenize module and Ka-Ping Yee's regurgitate.

Python->Tokens->Displace cells->regurgitate->Python

Since I have lots of other things to do, I decided to do it the same way, and wrote the equivalent of regurgitate for Traxter, my formula language.

It turns out it was not really hard, but I had to redo parts of the parser so it kept more information about the source.

After that it was simple, you see, Traxter compiles to Python. That means all I had to do was another (very similar) backend, and there it is, a Traxter-to-Traxter compiler. Or decompiler. Or something. And relative cell references are working again (and now with the right syntax).

A graph is a graph is a graph.

After hacking for about two hours the cell dependencies yesterday using dicts, I found myself saying "how can I check if the dependency graph is cyclical?"

And of course it hit me. That's the dependency graph. Use a graph library!

I looked for about 2 minutes before finding one that's probably not optimal, but is damn simple and in the public domain: graph_lib.py by Nathan Denny.

First interesting data point, the first two lines on the file (yes, I found out there is a later version):

#--Version 1.0.0
#--Nathan Denny, May 27, 1999

Yup. In two days this piece of code is turning 8 years old and untouched. But it works just fine and dandy!

Here's the piece of code that makes the engine run which has grown from a humble 10 LOC to almost a whooping 40 LOC:

class SpreadSheet(QtCore.QObject):
    _cells = {}
    tools = {}
    graph=Graph()

    def __init__(self,parent):
        QtCore.QObject.__init__(self,parent)
        for name in dir(functions):
                self.tools[name]=eval('functions.'+name)


    def __setitem__(self, key, formula):
        key=key.lower()
        c=traxcompile('%s=%s;'%(key,formula))
        self._cells[key] = [c[key][0],False,compile(c[key][0],"Formula for %s"%key,'eval')]

        # Dependency graph
        if not self.graph.has_node(key):
                self.graph.add_node(key)
        for edge in self.graph.in_arcs(key):
                self.graph.delete_edge(edge)
        for cell in c[key][1]:
                self.graph.add_edge(cell,key)
        try:
                print 'GRAPH(TOPO): ',self.graph.topological_sort()
                self._cells[key][1]=False
                print 'GRAPH(BFS) : ',self.graph.bfs(key)
                for cell in self.graph.bfs(key)[1:]:
                        self.emit(QtCore.SIGNAL('changed'),(cell))
        except Graph_topological_error:
                # We made the graph cyclic
                # So, mark this cell as evil
                self._cells[key][1]=True
                # And remove all incoming edges to go back to
                # status quo
                for edge in self.graph.in_arcs(key):
                        self.graph.delete_edge(edge)

    def getformula(self, key):
        key=key.lower()
        return self._cells[key][0]
    def __getitem__(self, key ):
        print self._cells[key]
        if self._cells[key][1]:
                return "ERROR: cyclic dependency"
        else:
                print 'evaluating [%s]: '%key,type(self._cells[key][0]),self._cells[key][0]
              return eval(self._cells[key][0], self.tools, self)

This engine supports:

  • User defined functions (just add them in formula.py)
  • Cell dependencies with detection of cicles, of any length
  • Unlimited size spreadsheets
  • Notification of cell changes to other modules
  • Automatic recalculation
  • A custom formula language (ok, that's not in this piece of code ;-)

The whole project is now 1167 LOC, of which 591 are generated or graph_lib, which means everything including the traxter "compiler" and the UI is less than 600 lines of code.

Not too shabby.

My goal is a functional spreadsheet with a working GUI and supporting the most common functions and pieces of the formula language in .... 2000 LOC, and I am willing to work on it for about two weeks.

Let's see how it turns out. Of course anyone with 5 free minutes can contribute his favourite spreadsheet function (I already have sum ;-)

New software project: Stupid Sheet

Adding something else to my plate is probably not a very good idea, but what the heck, I can make it sleep another three years if I lose interest.

So: I am writing a real spreadsheet in python.

Probably never going to be useful for corporations, but it should be at least as featureful as Google's and it should be amazingly small.

Here are the components:

  • Traxter: my spreadsheet-formula-like-language with dependency tracking that compiles to python.
  • PyQt (hey, it has a grid widget)
  • Python (Of course)

The status right now:

  • It's almost as functional as it was 2.5 years ago
  • Except for broken relative cells.
  • But with the beginning of a real formula language.
  • With automatic recalculation and cyclic dependency checks.
  • Adding things is dead simple. Here's the implementation of SUM (not uploaded yet, though):
def sum(*args):
  ac=0
  for v in args:
    ac+=v
  return ac

All the range stuff happens behind the scenes (although you may get a function called with thousands of args... I wonder how well Python handles that).

You can check it at the google code project (Use the SVN).

The python spreadsheet: Another look (Traxter DSL)

I apologize in advance for any ugly amateurism in this post. It's my first attempt at a domain specific language :-)

Yesterday I posted about using PyCells to write a spreadsheet in Python.

Sadly, I can't figure out the problem with my code, and the PyCells mailing list seems to be pretty much dead.

So, I started thinking... what other ways are to achieve my goal? And decided to go medieval on this problem.

By that I mean that I will do it the most traditional way possible... with a twist.

The traditional way is, of course, to write one or more of lexer/parser/interpreter/compiler for the formula language.

Mind you, I don't intend to do anything complete, much less Excel-compatible (see Excel formula parsers are hell in this same blog.

So, let's start with a toy language, supporting the following:

  • Assignment to a variable
  • Classic 4-op arithmetics.
  • Function calls
  • Cell ranges

That's enough for a toy spreadsheet, and it should be easy to extend.

Here's a description of the grammar for such a language, written using Aperiot [1]:

# This is a simple language for arithmetic expressions

numbers
     number

operators
     plus   "+"
     times  "*"
     minus  "-"
     div    "/"
     equal  "="
     colon ":"
     comma ","
     semicolon ";"

brackets
     lpar  "("
     rpar  ")"


identifiers
     label

start
     LIST

rules

LIST             -> ASSIGNMENT                : "[$1]"
                  | ASSIGNMENT semicolon LIST : "[$1]+$3"
                  | ASSIGNMENT semicolon : "[$1]"

ASSIGNMENT       -> label equal EXPR : "($1,$3)"

ARGLIST          -> ARG comma ARGLIST : "[$1]+$3"
                  | ARG          : "[$1]"

ARG              -> RANGE       : "$1"
                  | EXPR        : "$1"
                  | label       : "$1"

EXPR             -> TERM              : "$1"
                  | TERM plus EXPR    : "(\'+\',$1,$3)"
                  | TERM minus EXPR   : "(\'-\',$1,$3)"

TERM             -> FACTOR               : "$1"
                  | FACTOR times TERM    : "(\'*\',$1,$3)"
                  | FACTOR div TERM      : "(\'/\',$1,$3)"


FACTOR           -> number           : "$1.val()"
                  | lpar EXPR rpar  : "(\'group\',$2)"
                  | FUNCALL     : "$1"
                  | label               : "$1"
                  | minus FACTOR    : "-$2"

FUNCALL          ->  label lpar ARGLIST rpar : "(\'funcall\',$1,$3)"

RANGE            -> label colon label   : "(\'range\',$1,$3)"

This transforms this:

A1=SUM(A1:A7)*2;
A3=2+2;

Into this:

[(<aperiot.lexer.Identifier instance at 0xb7af10ac>,
  ('*',
   ('funcall',
    <aperiot.lexer.Identifier instance at 0xb7af142c>,
    [('range',
      <aperiot.lexer.Identifier instance at 0xb7af15cc>,
      <aperiot.lexer.Identifier instance at 0xb7af144c>)]),
   2)),
 (<aperiot.lexer.Identifier instance at 0xb7b4c72c>, ('+', 2, 2))]

Which is sort of a tree with all the expressions in prefix notation in them.

Now, here is the twist: I will "compile" this tree into.... python code. So I can use eval to do the evaluation, just like in the original python spreadsheet recipe.

So this is sort of a preprocessor:

  • The user writes excel-like formulas.
  • The spreadsheet stores python code obtained through compilation.
  • The spreadsheet evals the python code.

Of course we have the same problem as usual: cell dependencies, which is the reason why I started playing with PyCells in the first place!

But... well, here's another trick: since I am compiling, I know whenever there is a variable referenced in the code. And I can remember them :-)

So, I can turn this:

A1=SUM(A1:A3)*2;
A3=2+2;

Into this:

[['A1=SUM(a1,a2,a3)*2;', set(['a1', 'a3', 'a2'])],
 ['A3=2+2;', set([])]]

The "compiled" python code and a dependency set. And voila, this spreadsheet will propagate correctly.

Here's the compiler... in about 60 lines of python [2]. And since the whole point of this language is to track dependencies... let's call it Traxter.

Of course, this is a toy right now. But it's a toy with potential!

from pprint import pprint
from aperiot.parsergen import build_parser
import aperiot
import cellutils
import sys

dependencies=set()

def addOp(*args):
        return '+'.join([compile_token(a) for a in args])
def mulOp(*args):
        return '*'.join([compile_token(a) for a in args])
def subOp(*args):
        return '-'.join([compile_token(a) for a in args])
def divOp(*args):
        return '/'.join([compile_token(a) for a in args])

def groupOp(*args):
        return '(%s)'%compile_token(args[0])

def funcOp(*args):
        return '%s(%s)'%(args[0].symbolic_name,
                         ','.join([compile_token(a) for a in args[1]]))

def rangeOp(*args):
        c1=args[0].symbolic_name
        c2=args[1].symbolic_name
        return ','.join([compile_token(a) for a in cellutils.cellrange(c1,c2)])

operators={'+':addOp,
           '-':subOp,
           '*':mulOp,
           '/':divOp,
           'group':groupOp,
           'funcall':funcOp,
           'range':rangeOp
           }


def compile_token(token):
        if isinstance (token,aperiot.lexer.Identifier):
                v=token.symbolic_name.lower()
                dependencies.add(v)
                return v
        if isinstance(token,list) or isinstance(token,tuple):
            return apply(operators[token[0]],token[1:])
        return str(token)

def compile_assignment(tokens):
        target=tokens[0].symbolic_name
        compiled=compile_token(tokens[1])
        return '%s=%s;'%(target,compiled)


myparser = build_parser('traxter')
t='A1=SUM(A1:A7)*2;A3=2+2;'
assign_list=myparser.parse(t)
pprint (assign_list)

compiled=[]
for assignment in assign_list:
        dependencies=set()
        c=compile_assignment(assignment)
        compiled.append([c,dependencies])

print compiled
[1] You may be asking yourself:what the heck is Aperiot? Or Why the heck Aperiot? Well... I had never heard of it until 6 hours ago, and I just wrote a DSL using it. That means it's worth knowing.
[2] cellrange() is left as an exercise for the reader because my current implementation is shameful ;-)

PyCells: The Python SpreadSheet redux

In 2004 I saw a recipe about how to make a "spreadsheet" in python in 10 lines of code:

class SpreadSheet:
    _cells = {}
    tools = {}
    def __setitem__(self, key, formula):
        self._cells[key] = formula
    def getformula(self, key):
        return self._cells[key]
    def __getitem__(self, key ):
        return eval(self._cells[key], SpreadSheet.tools, self)

It's shocking. And it works, too:

>>> from math import sin, pi
>>> SpreadSheet.tools.update(sin=sin, pi=pi, len=len)
>>> ss = SpreadSheet()
>>> ss['a1'] = '5'
>>> ss['a2'] = 'a1*6'
>>> ss['a3'] = 'a2*7'
>>> ss['a3']
210
>>> ss['b1'] = 'sin(pi/4)'
>>> ss['b1']
0.70710678118654746
>>> ss.getformula('b1')
'sin(pi/4)'

I was so awed, I wrote a PyQt version . Of course there is a catch in that code: it sucks if you are trying to write a spreadsheet with it.

Why? Because it doesn't store results, but only formulas.

For example:

A1=2
A2=A1*2

If you ask for the value of A2, you get 4. If you set A1 to 7, what's the value of A2?

Well, it's nothing yet, because it's only calculated when you ask for it. But suppose you are trying to display that sheet... you need to know A2's value changed when you set A1!

That's cell dependencies, and while that simple code handles them in a way, it totally sucks in another.

So, I went ahead and coded around it successfully. Of course the code was not so pretty anymore (although a large part of the uglyness is just for making it work with Python 2.3 and relative cells).

Then yesterday, while looking at the excel formula parser madness I saw a reference to PyCells, a python port of Cells from CLOS.

Here is a blog commenting on Pycells:

It basically takes the concept of a cell in a spreadsheet that get updated automatically to programming where there are a lot of internal data states that are dependent on one another in a chain, or a complex graph of dependencies. Like, the color of a button depends on whether you selected a radio button or not. Or, shut down the motor if the sensor reads above 100 degrees (example given in text).

Almost everyone uses that analogy... however, no matter how hard I looked, I couldn't find anyone who had actually tried writing a spreadsheet using PyCells! Not even as an example!

So here it is:

import cells

class Cell(cells.Model):
    formula=cells.makecell(value='')
    @cells.fun2cell()
    def value(self,prev):
        print "eval ",self.formula
        return eval(self.formula, {}, self.ss)
    def __init__(self, ss, *args, **kwargs):
        self.ss=ss
        cells.Model.__init__(self, *args, **kwargs)


class ssDict:
        def __init__(self):
                self.ss={}

        def __getitem__(self,key):
                return self.ss[key].value

        def __setitem__(self,key,v):
                if not self.ss.has_key(key):
                        c=Cell(self)
                        c.formula=v
                        self.ss[key]=c
                else:
                        self.ss[key].formula=v

if __name__ == "__main__":
        ss=ssDict()
        ss['a1'] ='5'

        ss['a2']='2*a1'
        print "a1: ", ss['a1']
        print "a2: ", ss['a2']

        ss['a1'] = '7'
        print "a1: ", ss['a1']
        print "a2: ", ss['a2']

And here you can see it running:

[[email protected] cells]$ python ctest.py
a1:  eval  5
5
a2:  eval  2*a1
10
eval  7
eval  2*a1
a1:  7
a2:  14

See how when I set a1 to 7 I get "eval 7" and "eval 2*a1"? That's because it's propagating changes the right way. And that's why this would work as a basis for a spreadsheet.

UPDATE

It seems there is a bug wither in PyCells, or in my example, or something, because it breaks pretty easily, if the dependency chain is even two cells:

a1:  eval  5
5
a2:  eval  2*a1
10
a3:  eval  2*a2
20
eval  7
eval  2*a1
a1:  7
a2:  14
a3:  20

In this example, I am setting A3 to 2*A2, and when I update A1, A3 is not updated. Further research is needed.

UPDATE 2

Please check the reddit comments. They are very educational!

A *real* programming challenge.

A long time ago, I wrote a piece about how I didn't like kcalc. It contained a very lame pyqt script showing a (IMHO) nicer calculator. Strangely, that lead to two very cool implementations of the concept!

One of them was written in Ruby, the other one in C++. I think that has some potential.

A few months later, I wrote a spreadsheet based on the same concept. Also based on PyQt.

This StupidSheet has some conceptual problems. Like, if you want to import Excel sheets, you would have to rewrite basic in python, so it's not a practical program, but it is a rather nice example showing programming using dynamic languages.

In fact, I used it as such last week at CafeConf.

Now, here's the challenge. If people that know how to write Ruby or Java apps using Qt (or KDE, why not) could write a similar application, we all could write a comparative guide to Qt/KDE programming on different languages.

Since we would all be starting with a not-too-complex, but really non-trivial example, and we would all do the same one, it should be pretty unbiased.

In fact, if you think this example is biased, please propose another one, and do this thing anyway.

You can find StupidSheet here

It has some small bugs (try setting B1 to A1+1 with no value in A1 ;-) but they are easy to fix.

We could remove some features (like the weird pasting stuff) to make the example more didactic.

I hope this gets some answers :-)

CafeConf 2005

Nuevamente este año voy a estar en CafeConf . Es una charla de 45 minutos sobre PyQt el 13 de octubre al mediodía.

La idea: La gente se sorprende cuando uno agarra KHTML, engancha un par de widgets y sale con un navegador web. Se deberian sorprender más de que uno puede partir de una receta de 20 líneas en una página web y terminar con una planilla que funciona ;-)

Por lo tanto, voy a mostrar StupidSheet como un ejemplo de que el desarrollo de software con interface gráfica es mas facil de lo que la gente cree.

Como siempre, si mencionás esta página, te ganás una cerveza. Máximo 2 cervezas, no muy buenas.


I will be at CafeConf again this year. It's a 45-minute thing about PyQt in October 13th, at noon.

My idea is: People are amazed when you hook Khtml to a couple of widgets and write a lame web browser. They should be more amazed that it is possible to start with nothing more than a 20-line recipe from a website and end with a functional spreadsheet ;-)

So, I will be showing StupidSheet as an example of how writing GUI software in Python is simpler than people think.

As usual, if you mention this page, you get a free beer, maximum 2 beers, and not very good beer.