Excel formula parsers are hell

On 2004 I wrote a spreadsheet in python, which was about a 25KB download (compressed). It was pretty functional!.

The main problem that would never let that program be a real application was that it used python as a formula language instead of the traditional Lotus/Excel thing.

Yesterday I decided to look at it again, and see if there was a way around it [1]. My main asset was that I know a fair bit more about parsers now than I did then, and how hard could it be?

Well, it turns out that language is a whole lot harder than I knew!

I found this page about a parser

Here's an example from it:

(R[14]C[11] *IF(OR(AND(R[39]C[11]>=55,R[40]C[11]>=20),AND(R[40]C[11]>=20,
R11C3="YES")), R[45]C[11],R[43]C[11])),0))

Yes, that's a valid Excel formula. No, I have no idea what it does.

And here's some extra nuggets about the language (quoted from the same page):

  • For some reason Excel uses a comma as the range union operator. This means that its only unambiguous use between ranges is outside of a function call or within a subexpression (i.e., between parentheses).
  • For some other reason Excel uses a space (or multiple spaces) as the range intersection operator. While not as ambiguous as the comma, it does require some consideration.

And it goes on, and on, and on....

But hey, he wrote a nice parser!

Just for kicks, here's the best BNF I could find.

I really don't envy the KSpread authors if they try to be compatible to this garbage!

But why is this language so gnarly?

  • Because it has been growing organically for 30 years?
  • Because of backwards compatibility?
  • Because it was never designed?
  • All of the above?

I mean, it can't be intentional! Noone actually meant it to be like this, right?

[1] Yes, I routinely look at things I wrote and haven't touched in 2.4 years.


Comments powered by Disqus