Ir al contenido principal

# A graph is a graph is a graph.

Af­ter hack­ing for about two hours the cell de­pen­den­cies yes­ter­day us­ing dict­s, I found my­self say­ing "how can I check if the de­pen­den­cy graph is cycli­cal?"

And of course it hit me. That's the de­pen­den­cy graph. Use a graph li­brary!

I looked for about 2 min­utes be­fore find­ing one that's prob­a­bly not op­ti­mal, but is damn sim­ple and in the pub­lic do­main: graph_lib.py by Nathan Den­ny.

First in­ter­est­ing da­ta point, the first two lines on the file (yes, I found out there is a lat­er ver­sion):

```#--Version 1.0.0
#--Nathan Denny, May 27, 1999```

Yup. In two days this piece of code is turn­ing 8 years old and un­touched. But it works just fine and dandy!

Here's the piece of code that makes the en­gine run which has grown from a hum­ble 10 LOC to al­most a whoop­ing 40 LOC:

```class SpreadSheet(QtCore.QObject):
_cells = {}
tools = {}
graph=Graph()

def __init__(self,parent):
QtCore.QObject.__init__(self,parent)
for name in dir(functions):
self.tools[name]=eval('functions.'+name)

def __setitem__(self, key, formula):
key=key.lower()
c=traxcompile('%s=%s;'%(key,formula))
self._cells[key] = [c[key][0],False,compile(c[key][0],"Formula for %s"%key,'eval')]

# Dependency graph
if not self.graph.has_node(key):
for edge in self.graph.in_arcs(key):
self.graph.delete_edge(edge)
for cell in c[key][1]:
try:
print 'GRAPH(TOPO): ',self.graph.topological_sort()
self._cells[key][1]=False
print 'GRAPH(BFS) : ',self.graph.bfs(key)
for cell in self.graph.bfs(key)[1:]:
self.emit(QtCore.SIGNAL('changed'),(cell))
except Graph_topological_error:
# We made the graph cyclic
# So, mark this cell as evil
self._cells[key][1]=True
# And remove all incoming edges to go back to
# status quo
for edge in self.graph.in_arcs(key):
self.graph.delete_edge(edge)

def getformula(self, key):
key=key.lower()
return self._cells[key][0]
def __getitem__(self, key ):
print self._cells[key]
if self._cells[key][1]:
return "ERROR: cyclic dependency"
else:
print 'evaluating [%s]: '%key,type(self._cells[key][0]),self._cells[key][0]
return eval(self._cells[key][0], self.tools, self)
```

This en­gine sup­port­s:

• Us­er de­fined func­­tions (just add them in for­­mu­la.py)

• Cell de­pen­­den­­cies with de­tec­­tion of ci­­cles, of any length

• No­ti­­fi­­ca­­tion of cell changes to oth­­er mod­­ules

• Au­­to­­mat­ic re­­cal­cu­la­­tion

• A cus­­tom for­­mu­la lan­guage (ok, that's not in this piece of code ;-)

The whole project is now 1167 LOC, of which 591 are gen­er­at­ed or graph_lib, which means ev­ery­thing in­clud­ing the trax­ter "com­pil­er" and the UI is less than 600 lines of code.

Not too shab­by.

My goal is a func­tion­al spread­sheet with a work­ing GUI and sup­port­ing the most com­mon func­tions and pieces of the for­mu­la lan­guage in .... 2000 LOC, and I am will­ing to work on it for about two week­s.

Let's see how it turns out. Of course any­one with 5 free min­utes can con­trib­ute his favourite spread­sheet func­tion (I al­ready have sum ;-)

# New software project: Stupid Sheet

Adding some­thing else to my plate is prob­a­bly not a very good idea, but what the heck, I can make it sleep an­oth­er three years if I lose in­ter­est.

So: I am writ­ing a re­al spread­sheet in python.

Prob­a­bly nev­er go­ing to be use­ful for cor­po­ra­tions, but it should be at least as fea­ture­ful as Google's and it should be amaz­ing­ly smal­l.

Here are the com­po­nents:

• Trax­ter: my spread­­sheet-­­for­­mu­la-­­like-lan­guage with de­pen­­den­­cy track­­ing that com­piles to python.

• PyQt (hey, it has a grid wid­get)

• Python (Of course)

The sta­tus right now:

• It's al­­most as func­­tion­al as it was 2.5 years ago

• Ex­­cept for bro­ken rel­a­­tive cel­l­s.

• But with the be­gin­n­ing of a re­al for­­mu­la lan­guage.

• With au­­to­­mat­ic re­­cal­cu­la­­tion and cyclic de­pen­­den­­cy check­­s.

```def sum(*args):
ac=0
for v in args:
ac+=v
return ac
```

All the range stuff hap­pens be­hind the scenes (although you may get a func­tion called with thou­sands of args... I won­der how well Python han­dles that).

You can check it at the google code project (Use the SVN).

# The python spreadsheet: Another look (Traxter DSL)

I apol­o­gize in ad­vance for any ug­ly am­a­teurism in this post. It's my first at­tempt at a do­main spe­cif­ic lan­guage :-)

Yes­ter­day I post­ed about us­ing Py­Cells to write a spread­sheet in Python.

Sad­ly, I can't fig­ure out the prob­lem with my code, and the Py­Cells mail­ing list seems to be pret­ty much dead.

So, I start­ed think­ing... what oth­er ways are to achieve my goal? And de­cid­ed to go me­dieval on this prob­lem.

By that I mean that I will do it the most tra­di­tion­al way pos­si­ble... with a twist.

The tra­di­tion­al way is, of course, to write one or more of lex­er/­parser/in­ter­preter/­com­pil­er for the for­mu­la lan­guage.

Mind you, I don't in­tend to do any­thing com­plete, much less Ex­cel-­com­pat­i­ble (see Ex­cel for­mu­la parsers are hell in this same blog.

• As­sign­­ment to a var­i­able

• Clas­sic 4-op arith­met­ic­s.

• Func­­tion calls

• Cell ranges

That's enough for a toy spread­sheet, and it should be easy to ex­tend.

Here's a de­scrip­tion of the gram­mar for such a lan­guage, writ­ten us­ing Ape­ri­ot [1]:

```# This is a simple language for arithmetic expressions

numbers
number

operators
plus   "+"
times  "*"
minus  "-"
div    "/"
equal  "="
colon ":"
comma ","
semicolon ";"

brackets
lpar  "("
rpar  ")"

identifiers
label

start
LIST

rules

LIST             -> ASSIGNMENT                : "[\$1]"
| ASSIGNMENT semicolon LIST : "[\$1]+\$3"
| ASSIGNMENT semicolon : "[\$1]"

ASSIGNMENT       -> label equal EXPR : "(\$1,\$3)"

ARGLIST          -> ARG comma ARGLIST : "[\$1]+\$3"
| ARG          : "[\$1]"

ARG              -> RANGE       : "\$1"
| EXPR        : "\$1"
| label       : "\$1"

EXPR             -> TERM              : "\$1"
| TERM plus EXPR    : "(\'+\',\$1,\$3)"
| TERM minus EXPR   : "(\'-\',\$1,\$3)"

TERM             -> FACTOR               : "\$1"
| FACTOR times TERM    : "(\'*\',\$1,\$3)"
| FACTOR div TERM      : "(\'/\',\$1,\$3)"

FACTOR           -> number           : "\$1.val()"
| lpar EXPR rpar  : "(\'group\',\$2)"
| FUNCALL     : "\$1"
| label               : "\$1"
| minus FACTOR    : "-\$2"

FUNCALL          ->  label lpar ARGLIST rpar : "(\'funcall\',\$1,\$3)"

RANGE            -> label colon label   : "(\'range\',\$1,\$3)"```

This trans­forms this:

```A1=SUM(A1:A7)*2;
A3=2+2;```

In­to this:

```[(<aperiot.lexer.Identifier instance at 0xb7af10ac>,
('*',
('funcall',
<aperiot.lexer.Identifier instance at 0xb7af142c>,
[('range',
<aperiot.lexer.Identifier instance at 0xb7af15cc>,
<aperiot.lexer.Identifier instance at 0xb7af144c>)]),
2)),
(<aperiot.lexer.Identifier instance at 0xb7b4c72c>, ('+', 2, 2))]```

Which is sort of a tree with all the ex­pres­sions in pre­fix no­ta­tion in them.

Now, here is the twist: I will "com­pile" this tree in­to.... python code. So I can use eval to do the eval­u­a­tion, just like in the orig­i­nal python spread­sheet recipe.

So this is sort of a pre­pro­ces­sor:

• The us­er writes ex­cel-­­like for­­mu­las.

• The spread­­sheet stores python code ob­­tained through com­pi­la­­tion.

• The spread­­sheet evals the python code.

Of course we have the same prob­lem as usu­al: cell de­pen­den­cies, which is the rea­son why I start­ed play­ing with Py­Cells in the first place!

But... well, here's an­oth­er trick: since I am com­pil­ing, I know when­ev­er there is a vari­able ref­er­enced in the code. And I can re­mem­ber them :-)

So, I can turn this:

```A1=SUM(A1:A3)*2;
A3=2+2;```

In­to this:

```[['A1=SUM(a1,a2,a3)*2;', set(['a1', 'a3', 'a2'])],
['A3=2+2;', set([])]]```

The "com­piled" python code and a de­pen­den­cy set. And voila, this spread­sheet will prop­a­gate cor­rect­ly.

Here's the com­pil­er... in about 60 lines of python [2]. And since the whole point of this lan­guage is to track de­pen­den­cies... let's call it Trax­ter.

Of course, this is a toy right now. But it's a toy with po­ten­tial!

```from pprint import pprint
from aperiot.parsergen import build_parser
import aperiot
import cellutils
import sys

dependencies=set()

return '+'.join([compile_token(a) for a in args])
def mulOp(*args):
return '*'.join([compile_token(a) for a in args])
def subOp(*args):
return '-'.join([compile_token(a) for a in args])
def divOp(*args):
return '/'.join([compile_token(a) for a in args])

def groupOp(*args):
return '(%s)'%compile_token(args[0])

def funcOp(*args):
return '%s(%s)'%(args[0].symbolic_name,
','.join([compile_token(a) for a in args[1]]))

def rangeOp(*args):
c1=args[0].symbolic_name
c2=args[1].symbolic_name
return ','.join([compile_token(a) for a in cellutils.cellrange(c1,c2)])

'-':subOp,
'*':mulOp,
'/':divOp,
'group':groupOp,
'funcall':funcOp,
'range':rangeOp
}

def compile_token(token):
if isinstance (token,aperiot.lexer.Identifier):
v=token.symbolic_name.lower()
return v
if isinstance(token,list) or isinstance(token,tuple):
return apply(operators[token[0]],token[1:])
return str(token)

def compile_assignment(tokens):
target=tokens[0].symbolic_name
compiled=compile_token(tokens[1])
return '%s=%s;'%(target,compiled)

myparser = build_parser('traxter')
t='A1=SUM(A1:A7)*2;A3=2+2;'
assign_list=myparser.parse(t)
pprint (assign_list)

compiled=[]
for assignment in assign_list:
dependencies=set()
c=compile_assignment(assignment)
compiled.append([c,dependencies])

print compiled
```

# PyCells: The Python SpreadSheet redux

In 2004 I saw a recipe about how to make a "spread­sheet" in python in 10 lines of code:

```class SpreadSheet:
_cells = {}
tools = {}
def __setitem__(self, key, formula):
self._cells[key] = formula
def getformula(self, key):
return self._cells[key]
def __getitem__(self, key ):
```

It's shock­ing. And it work­s, too:

```>>> from math import sin, pi
>>> ss['a1'] = '5'
>>> ss['a2'] = 'a1*6'
>>> ss['a3'] = 'a2*7'
>>> ss['a3']
210
>>> ss['b1'] = 'sin(pi/4)'
>>> ss['b1']
0.70710678118654746
>>> ss.getformula('b1')
'sin(pi/4)'
```

I was so awed, I wrote a PyQt ver­sion . Of course there is a catch in that code: it sucks if you are try­ing to write a spread­sheet with it.

Why? Be­cause it does­n't store re­sult­s, but on­ly for­mu­las.

For ex­am­ple:

```A1=2
A2=A1*2```

If you ask for the val­ue of A2, you get 4. If you set A1 to 7, what's the val­ue of A2?

Well, it's noth­ing yet, be­cause it's on­ly cal­cu­lat­ed when you ask for it. But sup­pose you are try­ing to dis­play that sheet... you need to know A2's val­ue changed when you set A1!

That's cell de­pen­den­cies, and while that sim­ple code han­dles them in a way, it to­tal­ly sucks in an­oth­er.

So, I went ahead and cod­ed around it suc­cess­ful­ly. Of course the code was not so pret­ty any­more (although a large part of the ug­ly­ness is just for mak­ing it work with Python 2.3 and rel­a­tive cell­s).

Then yes­ter­day, while look­ing at the ex­cel for­mu­la pars­er mad­ness I saw a ref­er­ence to Py­Cells, a python port of Cells from CLOS.

Here is a blog com­ment­ing on Py­cells:

It ba­si­cal­ly takes the con­cept of a cell in a spread­sheet that get up­dat­ed au­to­mat­i­cal­ly to pro­gram­ming where there are a lot of in­ter­nal da­ta states that are de­pen­dent on one an­oth­er in a chain, or a com­plex graph of de­pen­den­cies. Like, the col­or of a but­ton de­pends on whether you se­lect­ed a ra­dio but­ton or not. Or, shut down the mo­tor if the sen­sor reads above 100 de­grees (ex­am­ple giv­en in tex­t).

Al­most ev­ery­one us­es that anal­o­gy... how­ev­er, no mat­ter how hard I looked, I could­n't find any­one who had ac­tu­al­ly tried writ­ing a spread­sheet us­ing Py­Cell­s! Not even as an ex­am­ple!

So here it is:

```import cells

class Cell(cells.Model):
formula=cells.makecell(value='')
@cells.fun2cell()
def value(self,prev):
print "eval ",self.formula
return eval(self.formula, {}, self.ss)
def __init__(self, ss, *args, **kwargs):
self.ss=ss
cells.Model.__init__(self, *args, **kwargs)

class ssDict:
def __init__(self):
self.ss={}

def __getitem__(self,key):
return self.ss[key].value

def __setitem__(self,key,v):
if not self.ss.has_key(key):
c=Cell(self)
c.formula=v
self.ss[key]=c
else:
self.ss[key].formula=v

if __name__ == "__main__":
ss=ssDict()
ss['a1'] ='5'

ss['a2']='2*a1'
print "a1: ", ss['a1']
print "a2: ", ss['a2']

ss['a1'] = '7'
print "a1: ", ss['a1']
print "a2: ", ss['a2']
```

And here you can see it run­ning:

```[ralsina@monty cells]\$ python ctest.py
a1:  eval  5
5
a2:  eval  2*a1
10
eval  7
eval  2*a1
a1:  7
a2:  14```

See how when I set a1 to 7 I get "e­val 7" and "e­val 2*a1"? That's be­cause it's prop­a­gat­ing changes the right way. And that's why this would work as a ba­sis for a spread­sheet.

# Excel formula parsers are hell

On 2004 I wrote a spread­sheet in python, which was about a 25KB down­load (com­pressed). It was pret­ty func­tion­al!.

The main prob­lem that would nev­er let that pro­gram be a re­al ap­pli­ca­tion was that it used python as a for­mu­la lan­guage in­stead of the tra­di­tion­al Lo­tus/Ex­cel thing.

Yes­ter­day I de­cid­ed to look at it again, and see if there was a way around it [1]. My main as­set was that I know a fair bit more about parsers now than I did then, and how hard could it be?

Well, it turns out that lan­guage is a whole lot hard­er than I knew!

Here's an ex­am­ple from it:

```=IF(R[39]C[11]>65,R[25]C[42],ROUND((R[11]C[11]*IF(OR(AND(R[39]C[11]>=55,
R[40]C[11]>=20),AND(R[40]C[11]>=20,R11C3="YES")),R[44]C[11],R[43]C[11]))+
(R[14]C[11] *IF(OR(AND(R[39]C[11]>=55,R[40]C[11]>=20),AND(R[40]C[11]>=20,
R11C3="YES")), R[45]C[11],R[43]C[11])),0))```

Yes, that's a valid Ex­cel for­mu­la. No, I have no idea what it does.

And here's some ex­tra nuggets about the lan­guage (quot­ed from the same page):

• For some rea­­son Ex­­cel us­es a com­­ma as the range union op­er­a­­tor. This means that its on­­ly un­am­bigu­ous use be­tween ranges is out­­­side of a func­­tion call or with­­in a sub­­ex­pres­­sion (i.e., be­tween paren­the­s­es).

• For some oth­­er rea­­son Ex­­cel us­es a space (or mul­ti­­ple spaces) as the range in­­ter­sec­­tion op­er­a­­tor. While not as am­bigu­ous as the com­­ma, it does re­quire some con­sid­er­a­­tion.

And it goes on, and on, and on....

But hey, he wrote a nice parser!

Just for kick­s, here's the best BNF I could find.

I re­al­ly don't en­vy the KSpread au­thors if they try to be com­pat­i­ble to this garbage!

But why is this lan­guage so gnarly?

• Be­­cause it has been grow­ing or­­gan­i­­cal­­ly for 30 years?

• Be­­cause of back­­wards com­­pat­i­­bil­i­­ty?

• Be­­cause it was nev­er de­signed?

• All of the above?

I mean, it can't be in­ten­tion­al! Noone ac­tu­al­ly meant it to be like this, right?