Ir al contenido principal

Ralsina.Me — El sitio web de Roberto Alsina

Stupid Sheet: Redoing cell displacements

For my spread­sheet pro­jec­t, I had to re­do some­thing I had for­got­ten about: cell dis­place­men­t. I did that once when the for­mu­la lan­guage was python.

At the time, I parsed the python us­ing the to­k­enize mod­ule and Ka-Ping Yee's re­gur­gi­tate.

Python->­To­ken­s->Dis­place cell­s->re­gur­gi­tate->Python

Since I have lots of oth­er things to do, I de­cid­ed to do it the same way, and wrote the equiv­a­lent of re­gur­gi­tate for Trax­ter, my for­mu­la lan­guage.

It turns out it was not re­al­ly hard, but I had to re­do parts of the pars­er so it kept more in­for­ma­tion about the source.

Af­ter that it was sim­ple, you see, Trax­ter com­piles to Python. That means all I had to do was an­oth­er (very sim­i­lar) back­end, and there it is, a Trax­ter-­to-­Trax­ter com­pil­er. Or de­com­pil­er. Or some­thing. And rel­a­tive cell ref­er­ences are work­ing again (and now with the right syn­tax).

LINA: Intriguing

It promis­es:

  • Na­­tive look&feel for your apps on all ma­jor plat­­forms

  • Works for graph­i­­cal and ter­mi­­nal apps

  • Open Source

  • Low over­­head

  • Uni­ver­sal bi­­na­ries that work on Win­­dows, Mac and Lin­ux

Seems too good to be true, but it is sup­posed to be re­leased next month.

A graph is a graph is a graph.

Af­ter hack­ing for about two hours the cell de­pen­den­cies yes­ter­day us­ing dict­s, I found my­self say­ing "how can I check if the de­pen­den­cy graph is cycli­cal?"

And of course it hit me. That's the de­pen­den­cy graph. Use a graph li­brary!

I looked for about 2 min­utes be­fore find­ing one that's prob­a­bly not op­ti­mal, but is damn sim­ple and in the pub­lic do­main: graph_lib.py by Nathan Den­ny.

First in­ter­est­ing da­ta point, the first two lines on the file (yes, I found out there is a lat­er ver­sion):

#--Version 1.0.0
#--Nathan Denny, May 27, 1999

Yup. In two days this piece of code is turn­ing 8 years old and un­touched. But it works just fine and dandy!

Here's the piece of code that makes the en­gine run which has grown from a hum­ble 10 LOC to al­most a whoop­ing 40 LOC:

class SpreadSheet(QtCore.QObject):
    _cells = {}
    tools = {}
    graph=Graph()

    def __init__(self,parent):
        QtCore.QObject.__init__(self,parent)
        for name in dir(functions):
                self.tools[name]=eval('functions.'+name)


    def __setitem__(self, key, formula):
        key=key.lower()
        c=traxcompile('%s=%s;'%(key,formula))
        self._cells[key] = [c[key][0],False,compile(c[key][0],"Formula for %s"%key,'eval')]

        # Dependency graph
        if not self.graph.has_node(key):
                self.graph.add_node(key)
        for edge in self.graph.in_arcs(key):
                self.graph.delete_edge(edge)
        for cell in c[key][1]:
                self.graph.add_edge(cell,key)
        try:
                print 'GRAPH(TOPO): ',self.graph.topological_sort()
                self._cells[key][1]=False
                print 'GRAPH(BFS) : ',self.graph.bfs(key)
                for cell in self.graph.bfs(key)[1:]:
                        self.emit(QtCore.SIGNAL('changed'),(cell))
        except Graph_topological_error:
                # We made the graph cyclic
                # So, mark this cell as evil
                self._cells[key][1]=True
                # And remove all incoming edges to go back to
                # status quo
                for edge in self.graph.in_arcs(key):
                        self.graph.delete_edge(edge)

    def getformula(self, key):
        key=key.lower()
        return self._cells[key][0]
    def __getitem__(self, key ):
        print self._cells[key]
        if self._cells[key][1]:
                return "ERROR: cyclic dependency"
        else:
                print 'evaluating [%s]: '%key,type(self._cells[key][0]),self._cells[key][0]
              return eval(self._cells[key][0], self.tools, self)

This en­gine sup­port­s:

  • Us­er de­fined func­­tions (just add them in for­­mu­la.py)

  • Cell de­pen­­den­­cies with de­tec­­tion of ci­­cles, of any length

  • Un­lim­it­ed size spread­­sheets

  • No­ti­­fi­­ca­­tion of cell changes to oth­­er mod­­ules

  • Au­­to­­mat­ic re­­cal­cu­la­­tion

  • A cus­­tom for­­mu­la lan­guage (ok, that's not in this piece of code ;-)

The whole project is now 1167 LOC, of which 591 are gen­er­at­ed or graph_lib, which means ev­ery­thing in­clud­ing the trax­ter "com­pil­er" and the UI is less than 600 lines of code.

Not too shab­by.

My goal is a func­tion­al spread­sheet with a work­ing GUI and sup­port­ing the most com­mon func­tions and pieces of the for­mu­la lan­guage in .... 2000 LOC, and I am will­ing to work on it for about two week­s.

Let's see how it turns out. Of course any­one with 5 free min­utes can con­trib­ute his favourite spread­sheet func­tion (I al­ready have sum ;-)

New software project: Stupid Sheet

Adding some­thing else to my plate is prob­a­bly not a very good idea, but what the heck, I can make it sleep an­oth­er three years if I lose in­ter­est.

So: I am writ­ing a re­al spread­sheet in python.

Prob­a­bly nev­er go­ing to be use­ful for cor­po­ra­tions, but it should be at least as fea­ture­ful as Google's and it should be amaz­ing­ly smal­l.

Here are the com­po­nents:

  • Trax­ter: my spread­­sheet-­­for­­mu­la-­­like-lan­guage with de­pen­­den­­cy track­­ing that com­piles to python.

  • PyQt (hey, it has a grid wid­get)

  • Python (Of course)

The sta­tus right now:

  • It's al­­most as func­­tion­al as it was 2.5 years ago

  • Ex­­cept for bro­ken rel­a­­tive cel­l­s.

  • But with the be­gin­n­ing of a re­al for­­mu­la lan­guage.

  • With au­­to­­mat­ic re­­cal­cu­la­­tion and cyclic de­pen­­den­­cy check­­s.

  • Adding things is dead sim­­ple. Here's the im­­ple­­men­­ta­­tion of SUM (not up­­load­­ed yet, though):

def sum(*args):
  ac=0
  for v in args:
    ac+=v
  return ac

All the range stuff hap­pens be­hind the scenes (although you may get a func­tion called with thou­sands of args... I won­der how well Python han­dles that).

You can check it at the google code project (Use the SVN).

The python spreadsheet: Another look (Traxter DSL)

I apol­o­gize in ad­vance for any ug­ly am­a­teurism in this post. It's my first at­tempt at a do­main spe­cif­ic lan­guage :-)

Yes­ter­day I post­ed about us­ing Py­Cells to write a spread­sheet in Python.

Sad­ly, I can't fig­ure out the prob­lem with my code, and the Py­Cells mail­ing list seems to be pret­ty much dead.

So, I start­ed think­ing... what oth­er ways are to achieve my goal? And de­cid­ed to go me­dieval on this prob­lem.

By that I mean that I will do it the most tra­di­tion­al way pos­si­ble... with a twist.

The tra­di­tion­al way is, of course, to write one or more of lex­er/­parser/in­ter­preter/­com­pil­er for the for­mu­la lan­guage.

Mind you, I don't in­tend to do any­thing com­plete, much less Ex­cel-­com­pat­i­ble (see Ex­cel for­mu­la parsers are hell in this same blog.

So, let's start with a toy lan­guage, sup­port­ing the fol­low­ing:

  • As­sign­­ment to a var­i­able

  • Clas­sic 4-op arith­met­ic­s.

  • Func­­tion calls

  • Cell ranges

That's enough for a toy spread­sheet, and it should be easy to ex­tend.

Here's a de­scrip­tion of the gram­mar for such a lan­guage, writ­ten us­ing Ape­ri­ot 1:

# This is a simple language for arithmetic expressions

numbers
     number

operators
     plus   "+"
     times  "*"
     minus  "-"
     div    "/"
     equal  "="
     colon ":"
     comma ","
     semicolon ";"

brackets
     lpar  "("
     rpar  ")"


identifiers
     label

start
     LIST

rules

LIST             -> ASSIGNMENT                : "[$1]"
                  | ASSIGNMENT semicolon LIST : "[$1]+$3"
                  | ASSIGNMENT semicolon : "[$1]"

ASSIGNMENT       -> label equal EXPR : "($1,$3)"

ARGLIST          -> ARG comma ARGLIST : "[$1]+$3"
                  | ARG          : "[$1]"

ARG              -> RANGE       : "$1"
                  | EXPR        : "$1"
                  | label       : "$1"

EXPR             -> TERM              : "$1"
                  | TERM plus EXPR    : "(\'+\',$1,$3)"
                  | TERM minus EXPR   : "(\'-\',$1,$3)"

TERM             -> FACTOR               : "$1"
                  | FACTOR times TERM    : "(\'*\',$1,$3)"
                  | FACTOR div TERM      : "(\'/\',$1,$3)"


FACTOR           -> number           : "$1.val()"
                  | lpar EXPR rpar  : "(\'group\',$2)"
                  | FUNCALL     : "$1"
                  | label               : "$1"
                  | minus FACTOR    : "-$2"

FUNCALL          ->  label lpar ARGLIST rpar : "(\'funcall\',$1,$3)"

RANGE            -> label colon label   : "(\'range\',$1,$3)"

This trans­forms this:

A1=SUM(A1:A7)*2;
A3=2+2;

In­to this:

[(<aperiot.lexer.Identifier instance at 0xb7af10ac>,
  ('*',
   ('funcall',
    <aperiot.lexer.Identifier instance at 0xb7af142c>,
    [('range',
      <aperiot.lexer.Identifier instance at 0xb7af15cc>,
      <aperiot.lexer.Identifier instance at 0xb7af144c>)]),
   2)),
 (<aperiot.lexer.Identifier instance at 0xb7b4c72c>, ('+', 2, 2))]

Which is sort of a tree with all the ex­pres­sions in pre­fix no­ta­tion in them.

Now, here is the twist: I will "com­pile" this tree in­to.... python code. So I can use eval to do the eval­u­a­tion, just like in the orig­i­nal python spread­sheet recipe.

So this is sort of a pre­pro­ces­sor:

  • The us­er writes ex­cel-­­like for­­mu­las.

  • The spread­­sheet stores python code ob­­tained through com­pi­la­­tion.

  • The spread­­sheet evals the python code.

Of course we have the same prob­lem as usu­al: cell de­pen­den­cies, which is the rea­son why I start­ed play­ing with Py­Cells in the first place!

But... well, here's an­oth­er trick: since I am com­pil­ing, I know when­ev­er there is a vari­able ref­er­enced in the code. And I can re­mem­ber them :-)

So, I can turn this:

A1=SUM(A1:A3)*2;
A3=2+2;

In­to this:

[['A1=SUM(a1,a2,a3)*2;', set(['a1', 'a3', 'a2'])],
 ['A3=2+2;', set([])]]

The "com­piled" python code and a de­pen­den­cy set. And voila, this spread­sheet will prop­a­gate cor­rect­ly.

Here's the com­pil­er... in about 60 lines of python 2. And since the whole point of this lan­guage is to track de­pen­den­cies... let's call it Trax­ter.

Of course, this is a toy right now. But it's a toy with po­ten­tial!

from pprint import pprint
from aperiot.parsergen import build_parser
import aperiot
import cellutils
import sys

dependencies=set()

def addOp(*args):
        return '+'.join([compile_token(a) for a in args])
def mulOp(*args):
        return '*'.join([compile_token(a) for a in args])
def subOp(*args):
        return '-'.join([compile_token(a) for a in args])
def divOp(*args):
        return '/'.join([compile_token(a) for a in args])

def groupOp(*args):
        return '(%s)'%compile_token(args[0])

def funcOp(*args):
        return '%s(%s)'%(args[0].symbolic_name,
                         ','.join([compile_token(a) for a in args[1]]))

def rangeOp(*args):
        c1=args[0].symbolic_name
        c2=args[1].symbolic_name
        return ','.join([compile_token(a) for a in cellutils.cellrange(c1,c2)])

operators={'+':addOp,
           '-':subOp,
           '*':mulOp,
           '/':divOp,
           'group':groupOp,
           'funcall':funcOp,
           'range':rangeOp
           }


def compile_token(token):
        if isinstance (token,aperiot.lexer.Identifier):
                v=token.symbolic_name.lower()
                dependencies.add(v)
                return v
        if isinstance(token,list) or isinstance(token,tuple):
            return apply(operators[token[0]],token[1:])
        return str(token)

def compile_assignment(tokens):
        target=tokens[0].symbolic_name
        compiled=compile_token(tokens[1])
        return '%s=%s;'%(target,compiled)


myparser = build_parser('traxter')
t='A1=SUM(A1:A7)*2;A3=2+2;'
assign_list=myparser.parse(t)
pprint (assign_list)

compiled=[]
for assignment in assign_list:
        dependencies=set()
        c=compile_assignment(assignment)
        compiled.append([c,dependencies])

print compiled
1

You may be ask­ing your­self:what the heck is Ape­ri­ot? Or Why the heck Ape­ri­ot? Well... I had nev­er heard of it un­til 6 hours ago, and I just wrote a DSL us­ing it. That means it's worth know­ing.

2

cell­range() is left as an ex­er­cise for the read­er be­cause my cur­rent im­ple­men­ta­tion is shame­ful ;-)


Contents © 2000-2021 Roberto Alsina