Skip to main content

Ralsina.Me — Roberto Alsina's website

This can't be good

Work­ing on my SPF li­brary, I ran in­to a prob­lem. I need­ed to val­i­date a spe­cif­ic el­e­men­t, and the python code is a lit­tle hairy (it splits based on a large reg­ex­p, and it's tricky to con­vert to C).

So, I asked, and was told, maybe you should start from the RFC's gram­mar.

Ok. I am not much in­to gram­mars and parser­s, but what the heck. So I check it. It's a AB­NF gram­mar.

So, I look for the ob­vi­ous thing: a AB­NF pars­er gen­er­a­tor.

There are very few of those, and none of them seems very solid, which is scary, be­cause al­most all the RFC's de­fine ev­ery­thing in terms of AB­NF (ex­cept for some that do worse, and de­fine in pros­e. Did you know there is no for­mal, ver­i­fi­able def­i­ni­tion of what an Ipv6 ad­dress looks like?).

So, af­ter hours of googling...

Any­one knows a good AB­NF pars­er gen­er­a­tor? I am try­ing with ab­n­f2c but it's not strict enough (I am get­ting a pars­er that does­n't work).

Any­one knows why those very im­por­tant doc­u­ments that rule how most of us make a liv­ing/­work/have fun are so ... hazy?

Martin Ellis / 2007-03-04 18:23:

Uhm... perhaps first, you really want something that will tell you what class of grammar you're dealing with?

As far as I know, all the 'compiler-compiler' tools tend to be specific to a particular class of grammar: they'll either generate lexical analysers, LL parsers or LR parsers, or whatever...
I don't think I've ever seen a (useful) tool that will do more than one of things.

Are you're just matching a reg. ex., or is there something trickier going on?

Roberto Alsina / 2007-03-04 18:35:

There's an element called a domain-spec.

It's defined in a ABNF grammar.

The python version of the code validates it by splitting it using a regexp.

If you email me, I can show you the code ( and the ABFN grammar :-)

Martin Ellis / 2007-03-04 19:21:

So, it looks like it's just a regexp.

I think you should be able to persuade flex to match the pattern, and use its 'rules' to track which parts of the string correspond to which parts of the grammar.

Perhaps define things like toplabel, delimiter and macro-literal in the definitions section of your flex input, and put domain-spec in the rules section.

It's been about 4 years since I looked at this stuff though - and even then, I was only doing a toy example or two - so I'm a bit hazy on it all too...

Don Hensley / 2007-03-04 19:24:

This will help you see why the RFCs are so fuzzy.
http://en.wikipedia.org/wik...

They may well be the first example of a form of Wiki - anyone could comment. Out of the comments came (usually) consensus.

It worked because most of us are going to go with the proposals with merit, and the other ideas sink.

And they are not any kind of mandatory. You can ignore them if you wish (usually a VERY bad idea).

Don.

Roberto Alsina / 2007-03-04 19:49:

Martin: I think I found a way to create the parser (just a validating parser, since I already do everything else).

The grammar for this specific element is not terrible complex, so I think I can manage.

Roberto Alsina / 2007-03-04 20:16:

Don: Well, the problem I mention is the proposals are defined using either prose or a grammar for which no good parser seems to exist (hell, the ABNF grammar's grammar is broken, too!)

It would be no harder to propose using, you know, things that are verifiable, so people can know if they are complying to the proposal or not :-)