The middle path
In my previous post, I mentioned how PySPF does something using a regular expression which I couldn't easily reproduce in C.
So, I started looking at parser generators to use the original SPF RFC's grammar.
But that had its own problems.... and then came ragel.
Ragel is a finite state machine compiler, and you can use it to generate simple parsers and validators.
The syntax is very simple, the results are powerful, and here's the main chunk of code that lets you parse a SPF domain-spec (it works, too!):
machine domain_spec; name = ( alpha ( alpha | digit | '-' | '_' | '.' )* ); macro_letter = 's' | 'l' | 'o' | 'd' | 'i' | 'p' | 'h' | 'c' | 'r' | 't'; transformers = digit* 'r'?; delimiter = '.' | '-' | '+' | ',' | '|' | '_' | '='; macro_expand = ( '%{' macro_letter transformers delimiter* '}' ) | '%%' | '%_' | '%-'; toplabel = ( alnum* alpha alnum* ) | ( alnum{1,} '-' ( alnum | '-' )* alnum ); domain_end = ( '.' toplabel '.'? ) | macro_expand; macro_literal = 0x21 .. 0x24 | 0x26 .. 0x7E; macro_string = ( macro_expand | macro_literal )*; domain_spec := macro_string domain_end 0 @{ res = 1; };
And in fact, it's simpler than the ABNF grammar used in the RFC:
name = ALPHA *( ALPHA / DIGIT / "-" / "_" / "." ) macro-letter = "s" / "l" / "o" / "d" / "i" / "p" / "h" / "c" / "r" / "t" transformers = *DIGIT [ "r" ] delimiter = "." / "-" / "+" / "," / "/" / "_" / "=" macro-expand = ( "%{" macro-letter transformers *delimiter "}" ) / "%%" / "%_" / "%-" toplabel = ( *alphanum ALPHA *alphanum ) / ( 1*alphanum "-" *( alphanum / "-" ) alphanum ) domain-end = ( "." toplabel [ "." ] ) / macro-expand macro-literal = %x21-24 / %x26-7E macro-string = *( macro-expand / macro-literal ) domain-spec = macro-string domain-end
So, thumbs up for ragel!
Update:
The code looks very bad on python or agregators.
This piece of code alone fixed 20 test cases from the SPF suite, and now only 8 fail. Neat!