A graph is a graph is a graph.
After hacking for about two hours the cell dependencies yesterday using dicts, I found myself saying "how can I check if the dependency graph is cyclical?"
And of course it hit me. That's the dependency graph. Use a graph library!
I looked for about 2 minutes before finding one that's probably not optimal, but is damn simple and in the public domain: graph_lib.py by Nathan Denny.
First interesting data point, the first two lines on the file (yes, I found out there is a later version):
#--Version 1.0.0 #--Nathan Denny, May 27, 1999
Yup. In two days this piece of code is turning 8 years old and untouched. But it works just fine and dandy!
Here's the piece of code that makes the engine run which has grown from a humble 10 LOC to almost a whooping 40 LOC:
class SpreadSheet(QtCore.QObject): _cells = {} tools = {} graph=Graph() def __init__(self,parent): QtCore.QObject.__init__(self,parent) for name in dir(functions): self.tools[name]=eval('functions.'+name) def __setitem__(self, key, formula): key=key.lower() c=traxcompile('%s=%s;'%(key,formula)) self._cells[key] = [c[key][0],False,compile(c[key][0],"Formula for %s"%key,'eval')] # Dependency graph if not self.graph.has_node(key): self.graph.add_node(key) for edge in self.graph.in_arcs(key): self.graph.delete_edge(edge) for cell in c[key][1]: self.graph.add_edge(cell,key) try: print 'GRAPH(TOPO): ',self.graph.topological_sort() self._cells[key][1]=False print 'GRAPH(BFS) : ',self.graph.bfs(key) for cell in self.graph.bfs(key)[1:]: self.emit(QtCore.SIGNAL('changed'),(cell)) except Graph_topological_error: # We made the graph cyclic # So, mark this cell as evil self._cells[key][1]=True # And remove all incoming edges to go back to # status quo for edge in self.graph.in_arcs(key): self.graph.delete_edge(edge) def getformula(self, key): key=key.lower() return self._cells[key][0] def __getitem__(self, key ): print self._cells[key] if self._cells[key][1]: return "ERROR: cyclic dependency" else: print 'evaluating [%s]: '%key,type(self._cells[key][0]),self._cells[key][0] return eval(self._cells[key][0], self.tools, self)
This engine supports:
User defined functions (just add them in formula.py)
Cell dependencies with detection of cicles, of any length
Unlimited size spreadsheets
Notification of cell changes to other modules
Automatic recalculation
A custom formula language (ok, that's not in this piece of code ;-)
The whole project is now 1167 LOC, of which 591 are generated or graph_lib, which means everything including the traxter "compiler" and the UI is less than 600 lines of code.
Not too shabby.
My goal is a functional spreadsheet with a working GUI and supporting the most common functions and pieces of the formula language in .... 2000 LOC, and I am willing to work on it for about two weeks.
Let's see how it turns out. Of course anyone with 5 free minutes can contribute his favourite spreadsheet function (I already have sum ;-)
Hi Roberto,
Do you know this small library http://cheeseshop.python.or... to implement lazy evaluation and track dependencies ?
It could be an interesting tool for stupidsheet.
best,
Seb
No, I had not found that one :-)
It has the same problem as the original recipe in that lazyness is not really something you want in a spreadsheet, but if it can be told to trigger calculations, then maybe it's useful.
I'll have to take a look.
Actually, lazyness is something you might want. For example, if A depends on B and a script changes B multiple times before A needs to be read, you should only have to calculate A once. This starts to be a big deal if you have a very deep dependency graph.
And as far as forcing recalculation, that's no problem. Check out the 'observers' module in Cellulose for an example of how it can be done. Whenever an observer cell receives notification that a dependency has possibly changed, it adds itself to a set which can later be 'flushed'.
You spreadsheet project sounds great!
MWM
BTW, Cellulose also solves the cyclic dependency problem. The cells that are currently being calculated are stored in a stack. By default an assert is made to make sure that a cell is not already being calculated.
MWM
MWM: thanks for the tips, I will chek it out.
However, that's not the real problem with cyclic dependencies in a spreadsheet.
What that approach does is avoid infinite loops. The real problem is that all the cells in the cycle are nonsensical or undefined, so you shouldn't ever let the user enter a formula that creates a loop.
Isn't that the same thing? When the user enters a formula you try evaluating it. If evaluating it fails due to cyclic dependencies you give feedback similar to how you would for a syntax error or something else.
Or, are you talking about a formula that is sometimes cyclic and sometimes not? (Due to branching.) Using raw cellulose cells wouldn't work for that.
(I haven't actually read you post yet, as I don't have much time online. I'll read it and be back in a day or two.)
MWM
You are right, if I try to evaluate a cell's formula, and reject the formula when it fails, the effect would be the same as my current approach.
The only difference would be that I detect it on parsing, and cellulose needs to do an evaluation, but that's not important.