Posts about python (old posts, page 12)

2007-05-25 19:06

A graph is a graph is a graph.

After hacking for about two hours the cell dependencies yesterday using dicts, I found myself saying "how can I check if the dependency graph is cyclical?"

And of course it hit me. That's the dependency graph. Use a graph library!

I looked for about 2 minutes before finding one that's probably not optimal, but is damn simple and in the public domain: graph_lib.py by Nathan Denny.

First interesting data point, the first two lines on the file (yes, I found out there is a later version):

#--Version 1.0.0
#--Nathan Denny, May 27, 1999

Yup. In two days this piece of code is turning 8 years old and untouched. But it works just fine and dandy!

Here's the piece of code that makes the engine run which has grown from a humble 10 LOC to almost a whooping 40 LOC:

class SpreadSheet(QtCore.QObject):
    _cells = {}
    tools = {}
    graph=Graph()

    def __init__(self,parent):
        QtCore.QObject.__init__(self,parent)
        for name in dir(functions):
                self.tools[name]=eval('functions.'+name)


    def __setitem__(self, key, formula):
        key=key.lower()
        c=traxcompile('%s=%s;'%(key,formula))
        self._cells[key] = [c[key][0],False,compile(c[key][0],"Formula for %s"%key,'eval')]

        # Dependency graph
        if not self.graph.has_node(key):
                self.graph.add_node(key)
        for edge in self.graph.in_arcs(key):
                self.graph.delete_edge(edge)
        for cell in c[key][1]:
                self.graph.add_edge(cell,key)
        try:
                print 'GRAPH(TOPO): ',self.graph.topological_sort()
                self._cells[key][1]=False
                print 'GRAPH(BFS) : ',self.graph.bfs(key)
                for cell in self.graph.bfs(key)[1:]:
                        self.emit(QtCore.SIGNAL('changed'),(cell))
        except Graph_topological_error:
                # We made the graph cyclic
                # So, mark this cell as evil
                self._cells[key][1]=True
                # And remove all incoming edges to go back to
                # status quo
                for edge in self.graph.in_arcs(key):
                        self.graph.delete_edge(edge)

    def getformula(self, key):
        key=key.lower()
        return self._cells[key][0]
    def __getitem__(self, key ):
        print self._cells[key]
        if self._cells[key][1]:
                return "ERROR: cyclic dependency"
        else:
                print 'evaluating [%s]: '%key,type(self._cells[key][0]),self._cells[key][0]
              return eval(self._cells[key][0], self.tools, self)

This engine supports:

  • User defined functions (just add them in formula.py)
  • Cell dependencies with detection of cicles, of any length
  • Unlimited size spreadsheets
  • Notification of cell changes to other modules
  • Automatic recalculation
  • A custom formula language (ok, that's not in this piece of code ;-)

The whole project is now 1167 LOC, of which 591 are generated or graph_lib, which means everything including the traxter "compiler" and the UI is less than 600 lines of code.

Not too shabby.

My goal is a functional spreadsheet with a working GUI and supporting the most common functions and pieces of the formula language in .... 2000 LOC, and I am willing to work on it for about two weeks.

Let's see how it turns out. Of course anyone with 5 free minutes can contribute his favourite spreadsheet function (I already have sum ;-)

2007-05-24 18:02

New software project: Stupid Sheet

Adding something else to my plate is probably not a very good idea, but what the heck, I can make it sleep another three years if I lose interest.

So: I am writing a real spreadsheet in python.

Probably never going to be useful for corporations, but it should be at least as featureful as Google's and it should be amazingly small.

Here are the components:

  • Traxter: my spreadsheet-formula-like-language with dependency tracking that compiles to python.
  • PyQt (hey, it has a grid widget)
  • Python (Of course)

The status right now:

  • It's almost as functional as it was 2.5 years ago
  • Except for broken relative cells.
  • But with the beginning of a real formula language.
  • With automatic recalculation and cyclic dependency checks.
  • Adding things is dead simple. Here's the implementation of SUM (not uploaded yet, though):
def sum(*args):
  ac=0
  for v in args:
    ac+=v
  return ac

All the range stuff happens behind the scenes (although you may get a function called with thousands of args... I wonder how well Python handles that).

You can check it at the google code project (Use the SVN).

2007-05-23 18:31

The python spreadsheet: Another look (Traxter DSL)

I apologize in advance for any ugly amateurism in this post. It's my first attempt at a domain specific language :-)

Yesterday I posted about using PyCells to write a spreadsheet in Python.

Sadly, I can't figure out the problem with my code, and the PyCells mailing list seems to be pretty much dead.

So, I started thinking... what other ways are to achieve my goal? And decided to go medieval on this problem.

By that I mean that I will do it the most traditional way possible... with a twist.

The traditional way is, of course, to write one or more of lexer/parser/interpreter/compiler for the formula language.

Mind you, I don't intend to do anything complete, much less Excel-compatible (see Excel formula parsers are hell in this same blog.

So, let's start with a toy language, supporting the following:

  • Assignment to a variable
  • Classic 4-op arithmetics.
  • Function calls
  • Cell ranges

That's enough for a toy spreadsheet, and it should be easy to extend.

Here's a description of the grammar for such a language, written using Aperiot [1]:

# This is a simple language for arithmetic expressions

numbers
     number

operators
     plus   "+"
     times  "*"
     minus  "-"
     div    "/"
     equal  "="
     colon ":"
     comma ","
     semicolon ";"

brackets
     lpar  "("
     rpar  ")"


identifiers
     label

start
     LIST

rules

LIST             -> ASSIGNMENT                : "[$1]"
                  | ASSIGNMENT semicolon LIST : "[$1]+$3"
                  | ASSIGNMENT semicolon : "[$1]"

ASSIGNMENT       -> label equal EXPR : "($1,$3)"

ARGLIST          -> ARG comma ARGLIST : "[$1]+$3"
                  | ARG          : "[$1]"

ARG              -> RANGE       : "$1"
                  | EXPR        : "$1"
                  | label       : "$1"

EXPR             -> TERM              : "$1"
                  | TERM plus EXPR    : "(\'+\',$1,$3)"
                  | TERM minus EXPR   : "(\'-\',$1,$3)"

TERM             -> FACTOR               : "$1"
                  | FACTOR times TERM    : "(\'*\',$1,$3)"
                  | FACTOR div TERM      : "(\'/\',$1,$3)"


FACTOR           -> number           : "$1.val()"
                  | lpar EXPR rpar  : "(\'group\',$2)"
                  | FUNCALL     : "$1"
                  | label               : "$1"
                  | minus FACTOR    : "-$2"

FUNCALL          ->  label lpar ARGLIST rpar : "(\'funcall\',$1,$3)"

RANGE            -> label colon label   : "(\'range\',$1,$3)"

This transforms this:

A1=SUM(A1:A7)*2;
A3=2+2;

Into this:

[(<aperiot.lexer.Identifier instance at 0xb7af10ac>,
  ('*',
   ('funcall',
    <aperiot.lexer.Identifier instance at 0xb7af142c>,
    [('range',
      <aperiot.lexer.Identifier instance at 0xb7af15cc>,
      <aperiot.lexer.Identifier instance at 0xb7af144c>)]),
   2)),
 (<aperiot.lexer.Identifier instance at 0xb7b4c72c>, ('+', 2, 2))]

Which is sort of a tree with all the expressions in prefix notation in them.

Now, here is the twist: I will "compile" this tree into.... python code. So I can use eval to do the evaluation, just like in the original python spreadsheet recipe.

So this is sort of a preprocessor:

  • The user writes excel-like formulas.
  • The spreadsheet stores python code obtained through compilation.
  • The spreadsheet evals the python code.

Of course we have the same problem as usual: cell dependencies, which is the reason why I started playing with PyCells in the first place!

But... well, here's another trick: since I am compiling, I know whenever there is a variable referenced in the code. And I can remember them :-)

So, I can turn this:

A1=SUM(A1:A3)*2;
A3=2+2;

Into this:

[['A1=SUM(a1,a2,a3)*2;', set(['a1', 'a3', 'a2'])],
 ['A3=2+2;', set([])]]

The "compiled" python code and a dependency set. And voila, this spreadsheet will propagate correctly.

Here's the compiler... in about 60 lines of python [2]. And since the whole point of this language is to track dependencies... let's call it Traxter.

Of course, this is a toy right now. But it's a toy with potential!

from pprint import pprint
from aperiot.parsergen import build_parser
import aperiot
import cellutils
import sys

dependencies=set()

def addOp(*args):
        return '+'.join([compile_token(a) for a in args])
def mulOp(*args):
        return '*'.join([compile_token(a) for a in args])
def subOp(*args):
        return '-'.join([compile_token(a) for a in args])
def divOp(*args):
        return '/'.join([compile_token(a) for a in args])

def groupOp(*args):
        return '(%s)'%compile_token(args[0])

def funcOp(*args):
        return '%s(%s)'%(args[0].symbolic_name,
                         ','.join([compile_token(a) for a in args[1]]))

def rangeOp(*args):
        c1=args[0].symbolic_name
        c2=args[1].symbolic_name
        return ','.join([compile_token(a) for a in cellutils.cellrange(c1,c2)])

operators={'+':addOp,
           '-':subOp,
           '*':mulOp,
           '/':divOp,
           'group':groupOp,
           'funcall':funcOp,
           'range':rangeOp
           }


def compile_token(token):
        if isinstance (token,aperiot.lexer.Identifier):
                v=token.symbolic_name.lower()
                dependencies.add(v)
                return v
        if isinstance(token,list) or isinstance(token,tuple):
            return apply(operators[token[0]],token[1:])
        return str(token)

def compile_assignment(tokens):
        target=tokens[0].symbolic_name
        compiled=compile_token(tokens[1])
        return '%s=%s;'%(target,compiled)


myparser = build_parser('traxter')
t='A1=SUM(A1:A7)*2;A3=2+2;'
assign_list=myparser.parse(t)
pprint (assign_list)

compiled=[]
for assignment in assign_list:
        dependencies=set()
        c=compile_assignment(assignment)
        compiled.append([c,dependencies])

print compiled
[1] You may be asking yourself:what the heck is Aperiot? Or Why the heck Aperiot? Well... I had never heard of it until 6 hours ago, and I just wrote a DSL using it. That means it's worth knowing.
[2] cellrange() is left as an exercise for the reader because my current implementation is shameful ;-)

2007-05-22 10:18

PyCells: The Python SpreadSheet redux

In 2004 I saw a recipe about how to make a "spreadsheet" in python in 10 lines of code:

class SpreadSheet:
    _cells = {}
    tools = {}
    def __setitem__(self, key, formula):
        self._cells[key] = formula
    def getformula(self, key):
        return self._cells[key]
    def __getitem__(self, key ):
        return eval(self._cells[key], SpreadSheet.tools, self)

It's shocking. And it works, too:

>>> from math import sin, pi
>>> SpreadSheet.tools.update(sin=sin, pi=pi, len=len)
>>> ss = SpreadSheet()
>>> ss['a1'] = '5'
>>> ss['a2'] = 'a1*6'
>>> ss['a3'] = 'a2*7'
>>> ss['a3']
210
>>> ss['b1'] = 'sin(pi/4)'
>>> ss['b1']
0.70710678118654746
>>> ss.getformula('b1')
'sin(pi/4)'

I was so awed, I wrote a PyQt version . Of course there is a catch in that code: it sucks if you are trying to write a spreadsheet with it.

Why? Because it doesn't store results, but only formulas.

For example:

A1=2
A2=A1*2

If you ask for the value of A2, you get 4. If you set A1 to 7, what's the value of A2?

Well, it's nothing yet, because it's only calculated when you ask for it. But suppose you are trying to display that sheet... you need to know A2's value changed when you set A1!

That's cell dependencies, and while that simple code handles them in a way, it totally sucks in another.

So, I went ahead and coded around it successfully. Of course the code was not so pretty anymore (although a large part of the uglyness is just for making it work with Python 2.3 and relative cells).

Then yesterday, while looking at the excel formula parser madness I saw a reference to PyCells, a python port of Cells from CLOS.

Here is a blog commenting on Pycells:

It basically takes the concept of a cell in a spreadsheet that get updated automatically to programming where there are a lot of internal data states that are dependent on one another in a chain, or a complex graph of dependencies. Like, the color of a button depends on whether you selected a radio button or not. Or, shut down the motor if the sensor reads above 100 degrees (example given in text).

Almost everyone uses that analogy... however, no matter how hard I looked, I couldn't find anyone who had actually tried writing a spreadsheet using PyCells! Not even as an example!

So here it is:

import cells

class Cell(cells.Model):
    formula=cells.makecell(value='')
    @cells.fun2cell()
    def value(self,prev):
        print "eval ",self.formula
        return eval(self.formula, {}, self.ss)
    def __init__(self, ss, *args, **kwargs):
        self.ss=ss
        cells.Model.__init__(self, *args, **kwargs)


class ssDict:
        def __init__(self):
                self.ss={}

        def __getitem__(self,key):
                return self.ss[key].value

        def __setitem__(self,key,v):
                if not self.ss.has_key(key):
                        c=Cell(self)
                        c.formula=v
                        self.ss[key]=c
                else:
                        self.ss[key].formula=v

if __name__ == "__main__":
        ss=ssDict()
        ss['a1'] ='5'

        ss['a2']='2*a1'
        print "a1: ", ss['a1']
        print "a2: ", ss['a2']

        ss['a1'] = '7'
        print "a1: ", ss['a1']
        print "a2: ", ss['a2']

And here you can see it running:

[[email protected] cells]$ python ctest.py
a1:  eval  5
5
a2:  eval  2*a1
10
eval  7
eval  2*a1
a1:  7
a2:  14

See how when I set a1 to 7 I get "eval 7" and "eval 2*a1"? That's because it's propagating changes the right way. And that's why this would work as a basis for a spreadsheet.

UPDATE

It seems there is a bug wither in PyCells, or in my example, or something, because it breaks pretty easily, if the dependency chain is even two cells:

a1:  eval  5
5
a2:  eval  2*a1
10
a3:  eval  2*a2
20
eval  7
eval  2*a1
a1:  7
a2:  14
a3:  20

In this example, I am setting A3 to 2*A2, and when I update A1, A3 is not updated. Further research is needed.

UPDATE 2

Please check the reddit comments. They are very educational!

2007-05-21 20:57

Excel formula parsers are hell

On 2004 I wrote a spreadsheet in python, which was about a 25KB download (compressed). It was pretty functional!.

The main problem that would never let that program be a real application was that it used python as a formula language instead of the traditional Lotus/Excel thing.

Yesterday I decided to look at it again, and see if there was a way around it [1]. My main asset was that I know a fair bit more about parsers now than I did then, and how hard could it be?

Well, it turns out that language is a whole lot harder than I knew!

I found this page about a parser

Here's an example from it:

=IF(R[39]C[11]>65,R[25]C[42],ROUND((R[11]C[11]*IF(OR(AND(R[39]C[11]>=55,
R[40]C[11]>=20),AND(R[40]C[11]>=20,R11C3="YES")),R[44]C[11],R[43]C[11]))+
(R[14]C[11] *IF(OR(AND(R[39]C[11]>=55,R[40]C[11]>=20),AND(R[40]C[11]>=20,
R11C3="YES")), R[45]C[11],R[43]C[11])),0))

Yes, that's a valid Excel formula. No, I have no idea what it does.

And here's some extra nuggets about the language (quoted from the same page):

  • For some reason Excel uses a comma as the range union operator. This means that its only unambiguous use between ranges is outside of a function call or within a subexpression (i.e., between parentheses).
  • For some other reason Excel uses a space (or multiple spaces) as the range intersection operator. While not as ambiguous as the comma, it does require some consideration.

And it goes on, and on, and on....

But hey, he wrote a nice parser!

Just for kicks, here's the best BNF I could find.

I really don't envy the KSpread authors if they try to be compatible to this garbage!

But why is this language so gnarly?

  • Because it has been growing organically for 30 years?
  • Because of backwards compatibility?
  • Because it was never designed?
  • All of the above?

I mean, it can't be intentional! Noone actually meant it to be like this, right?

[1] Yes, I routinely look at things I wrote and haven't touched in 2.4 years.

2007-05-18 14:25

Small software released: RA-WebPass

I just released a wee piece of software, called RA-WebPass which is simply a webpage that you can use to change linux system passwords.

Basically, I wrote it today out of my frustration with customers asking me "how can I change the password for the FTP server". Ok, this is how you can.

It's simple and doesn't have too many dependencies, so it should be rather secure, but don't trust me on that, and you need to run it as root, so be very very careful.

RA-WebPass home page

On a related note: it's quite satisfying to write something in two hours and just release it :-)

2007-05-15 12:44

BartleBlog change: Mako Templates

Since the very beginning, BartleBlog has been using CherryTemplate for its output formatting needs. I like it, because it's very simple.

However, it had grown rather cumbersome.

Specifically, most pages in a blog are sort of a page template with a body template inside (the main content).

To do that on CherryTemplate, I used a two-pass approach: generate the body, then pass it as parameter to the page template.

Which is a pain in some cases because you end basically having to do a rendering function for each kind of page, or some crazy-evil function (what I did).

Exploring the different python template engines, I ran into Mako and decided to give it a whirl. It looks good.

The approach is a bit different, it is much more powerful, but you can still use it simply if you can.

And the main feature was template inheritance. Using that, no more inner and outer templates, baby!

Oh, and performance is better:

Cherry

real    31m44.732s
user    21m18.336s
sys     2m7.628s


Mako

real    24m54.472s
user    19m9.508s
sys     1m56.375s

This is for completely rerendering the whole 7 years, 574 posts, 40 static articles, 14 category blog, and there is tons of optimizations to be done.

BTW: this is how you rerender the whole blog:

from BartleBlog.backend.blog import Blog
Blog().renderFullBlog()

2007-05-11 12:01

New Bartleblog Feature: Menu Editor

Took a while to implement, but BartleBlog finally got a functional menu editor:

bartleblog12.png

Right now, it only works with the mootools-based menu gadget, but I will start working on the yahoo menu version in a moment.

The only thing not working is the preview button, because it needs more support on the backend side.

2007-05-11 08:59

Python Trick: Save anything in config files

The Python config objects are convenient and simple, but they have a problem: you can only save strings. That means you need to store numbers as strings and remember to use the getint()/getfloat() methods (or coerce by hand!), which is error prone and anti-pythonic. Storing a list is even uglier.

You could store ascii pickles, but those are pretty unpleasant to read in some cases.

Here's my solution: Encode it using a JSON encoder first! (I am using demjson)

Silly obvious code fragment:

def getValue(section,key,default=None):
    try:
        return JSON().decode(conf.get (section,key))
    except:
        return default

def setValue(section,key,value):
    value=JSON().encode(value)
    try:
        r=conf.set(section,key,value)
    except ConfigParser.NoSectionError:
        conf.add_section(section)
        r=conf.set(section,key,value)
    f=open(os.path.expanduser('~/.bartleblog/config'),'w')
    conf.write(f)
    return r

With just a little effort you can have a readable ascii typed python config file.

2007-05-10 14:07

Today's first hour of hacking...

... has been all about UI.

I have always had a problem when writing PyQt apps: stock icons.

Which ones should I use? Where are they?

I usually fished through the crystalsvg icon set until I found one that seemed to be what I needed, and then copied it to my app.

Sadly, that's annoying in several ways:

  1. Since those are PNG icons, you need to find the right size.
  2. Not all icons are there for all sizes!
  3. Because of 2, I need to check three or four folders to see all the icons.

So, I decided to cut my losses, and see what else could be done. And here it is:

bartleblog11.png

I am now using all SVG icons, from the reinhardt set that will look equally out of place in all OSs, but which I like (and I think look awesome with this relaxed Domino theme). And because they are all SVG, I don't care about sizes, and they are all in the same place, and all is good.

And whenever Oxygen is released, all I need to do is switch the files around and that's that. Which is nice, too.

Of course there is a catch... it does look out of place, and I expect many to find it ugly. So what, since I am the only user of this app! ;-)

Contents © 2000-2019 Roberto Alsina