Posts about RaSPF

Official RaSPF page

Ok, time to go a little more public with this.

Here's a page for it (click on "read more") and I will ask the openspf guys to put it on the implementations list (let's see how that goes).

RaSPF on its way to release

I have been able to work some more on RaSPF and the results are encouraging.

Thanks to valgrind and test suites, I am pretty confident it doesn't leak memory, or at least, that it doesn't leak except on very rare cases.

I think I found a neat way to simplify memory management, though, and that's what I wanted to mention.

This is probably trivial for everyone reading, but I am a limited C programmer, so whenever something works unexpectedly right, I am happy ;-)

One problem with C memory management is that if you have many exit points for your functions, releasing everything you allocate is rather annoying, since you may have to do it in several different locations.

I compounded this problem because I am using exceptions (yeah, C doesn't have them. I used this).

Now not only do I have my returns but also my throws and whatever uncaught throw something I called has!

Hell, right?

Nope: what exceptions complicated, exceptions fixed. Look at this function:

bstring spf_query_get_explanation(spf_query *q, bstring spec)
{
    bstring txt=0;
    struct bstrList *l=0;
    bstring expanded=0;
    bstring result=0;
    struct tagbstring s=bsStatic("");

    try
    {
        // Expand an explanation
        if (spec && spec->slen)
        {
            expanded=spf_query_expand(q,spec,1);
            l=spf_query_dns_txt(q,expanded);

            if (l)
            {
                txt=bjoin(l,&s);
            }
            else
            {
                txt=bfromcstr("");
            }
            result=spf_query_expand(q,txt,0);
            throw(EXC_OK,0);
        }
        else
        {
            result=bfromcstr("explanation: Required option is missing");
            throw(EXC_OK,0);
        }
    }
    except
    {
        if(expanded) bdestroy(expanded);
        if(txt) bdestroy(txt);
        if(l) bstrListDestroy(l);
        on (EXC_OK)
        {
            return result;
        }
        if(result) bdestroy(result);
        throw(EXCEPTION.type,EXCEPTION.param1);
    }
}

It doesn't matter if spf_query_expand or spf_query_dns_txt throw an exception, this will not leak.

Nice, I think :-)

C is not Python II.

RaSPF, my C port of PySPF, is pretty much functional right now.

Here's what I mean:

  • It passes 75 internal unit tests (ok, 74 , but that one is arguable).
  • It passes 137 of 145 tests of the SPF official test suite.
  • It agrees with PySPF in 181 of the 183 cases of the libspf2 live DNS suite.
  • It segfaults in none of the 326 test cases.

So, while there are still some corner cases to debug, it's looking very good.

I even spent some time with valgrind to plug some leaks ( the internal test suite runs almost leakless, the real app is a sieve ;-)

All in all, if I can spend a little while with it during the week, I should be able to make a release that actually works.

Then, I can rewrite my SPF plugin for qmail, which was what sent me in this month-log tangent.

As a language wars comparison:

  • The sloccount of raspf is 2557 (or 2272 if we use the ragel grammar source instead of the generated file)
  • The sloccount of PySPF is 993.

So, a 2.6:1 or 2.28:1 code ratio.

However, I used 4 non-standard C libraries: bstrlib, udns, and helpers for hashes and exceptions, which add another 5794 LOCs.

So, it could be argued as a 8:1 ratio, too, but my C code is probably verbose in extreme, and many C lines are not really "logic" but declarations and such.

Also, I did not write PySPF, so his code may be more concise, but I tried my best to copy the flow as much as possible line-per-line.

In short, you need to write, according to this case, between 2 and 8 times more code than you do in Python.

That's a bit much!

The middle path

In my previous post, I mentioned how PySPF does something using a regular expression which I couldn't easily reproduce in C.

So, I started looking at parser generators to use the original SPF RFC's grammar.

But that had its own problems.... and then came ragel.

Ragel is a finite state machine compiler, and you can use it to generate simple parsers and validators.

The syntax is very simple, the results are powerful, and here's the main chunk of code that lets you parse a SPF domain-spec (it works, too!):

machine domain_spec;
name = ( alpha ( alpha | digit | '-' | '_' | '.' )* );
macro_letter = 's' | 'l' | 'o' | 'd' | 'i' | 'p' | 'h' | 'c' | 'r' | 't';
transformers = digit* 'r'?;
delimiter = '.' | '-' | '+' | ',' | '|' | '_' | '=';
macro_expand = ( '%{' macro_letter transformers delimiter* '}' ) |
               '%%' | '%_' | '%-';
toplabel = ( alnum* alpha alnum* ) |
           ( alnum{1,} '-' ( alnum | '-' )* alnum );
domain_end = ( '.' toplabel '.'? ) | macro_expand;
macro_literal = 0x21 .. 0x24 | 0x26 .. 0x7E;
macro_string = ( macro_expand | macro_literal )*;
domain_spec := macro_string domain_end 0 @{ res = 1; };

And in fact, it's simpler than the ABNF grammar used in the RFC:

name             = ALPHA *( ALPHA / DIGIT / "-" / "_" / "." )
macro-letter     = "s" / "l" / "o" / "d" / "i" / "p" / "h" /
                   "c" / "r" / "t"
transformers     = *DIGIT [ "r" ]
delimiter        = "." / "-" / "+" / "," / "/" / "_" / "="
macro-expand     = ( "%{" macro-letter transformers *delimiter "}" )
                   / "%%" / "%_" / "%-"
toplabel         = ( *alphanum ALPHA *alphanum ) /
                   ( 1*alphanum "-" *( alphanum / "-" ) alphanum )
domain-end       = ( "." toplabel [ "." ] ) / macro-expand
macro-literal    = %x21-24 / %x26-7E
macro-string     = *( macro-expand / macro-literal )
domain-spec      = macro-string domain-end

So, thumbs up for ragel!

Update:

  • The code looks very bad on python or agregators.
  • This piece of code alone fixed 20 test cases from the SPF suite, and now only 8 fail. Neat!

This can't be good

Working on my SPF library, I ran into a problem. I needed to validate a specific element, and the python code is a little hairy (it splits based on a large regexp, and it's tricky to convert to C).

So, I asked, and was told, maybe you should start from the RFC's grammar.

Ok. I am not much into grammars and parsers, but what the heck. So I check it. It's a ABNF grammar.

So, I look for the obvious thing: a ABNF parser generator.

There are very few of those, and none of them seems very solid, which is scary, because almost all the RFC's define everything in terms of ABNF (except for some that do worse, and define in prose. Did you know there is no formal, verifiable definition of what an Ipv6 address looks like?).

So, after hours of googling...

Anyone knows a good ABNF parser generator? I am trying with abnf2c but it's not strict enough (I am getting a parser that doesn't work).

Anyone knows why those very important documents that rule how most of us make a living/work/have fun are so ... hazy?

SPF test suite on RASPF

Here are the results as of right now:

  • Give the expected results: 82 tests
  • Give the wrong result: 48 tests
  • Give a correct but not preferred result (mostly because of SPF records and IPv6): 6 tests
  • Fail (crash): 9 tests

So, depending on how you look at it, RASPF passes between 61% and 56% of the tests.

Not bad so far :-)

Update: As of 20:52 ART, it's 105/0/35/5 and 72-76%. The bad news is that that was all the low hanging fruit, and now it gets much harder.

My SPF library kinda works

RaSPF, my attempted port of PySPF to C is now at a very special point in its life:

The provided CLI application can check SPF records and tell you what you should do with them!

Here's an example:

[[email protected] build]$ ./raspfquery --ip=192.0.2.1 --sender=03.spf1-test.mailzone.com --helo=03.spf1-test.mailzone.com
Checking SPF with:

sender: 03.spf1-test.mailzone.com
helo:   03.spf1-test.mailzone.com
ip:     192.0.2.1


response:       softfail
code:           250
explanation:    domain owner discourages use of this host

Is that correct? Apparently yes!

[[email protected] pyspf-2.0.2]$ python spf.py 192.0.2.1 03.spf1-test.mailzone.com 03.spf1-test.mailzone.com
('softfail', 250, 'domain owner discourages use of this host')

Is it useful? Surely you jest!

There are still the following problems:

  • The memory management is unexistant
  • I need to hack a way to run the official SPF test suite so I can see how well it works and that it works exactly as PySPF
  • It probably will segfault on many places
  • I am changing the error handling to be exception-based, thanks to EXCC
  • The IPv6 support is between iffy and not there
  • There is no support for SPF (type 99) DNS records, only TXT records (need to hack the udns library)

But really, this should be about 60% of the work, and it does work for some cases, which is more than I really expected at the beginning.

Here's the whole source code of the sample application (except for CLI option processing):

spf_init();
spf_response r=spf_check(ip,sender,helo,0,0);
printf ("\nresponse:\t%s\ncode:\t\t%d\nexplanation:\t\t%s\n",
        r.response,r.code,r.explanation);

My SPF lib improving

It now can do a bunch of things like expanding macros and (in some cases) validating mechanisms.

I am making very heavy use of unit testing, because it's a pretty complex piece and each function needs to do exactly the right thing or everything else fails (it's pretty hard to figure out where it will fail ;-)

You can check the 947 LOC thing at http://code.google.com/p/raspf (the Code tab).

If you do check it, jeep in mind the following:

  • It uses a few libs, and they are included in the source code for simplicity.
  • I do sometimes commit code that doesn't compile
  • I do sometimes commit code that fails tests
  • You need cmake
  • I am not giving a damn about memory management right now, so don't bother worrying about leaks: everything leaks in this code. I want to make it functional first, then I can plug it one function at a time (simply by running the unit testing code with a memory checker).

Enjoy (although it's not precisely enjoyable code right now ;-)

C is not Python

I am porting pyspf to C (long story, and I am stupid for trying). But of course, C is not python.

So you don't have anything nearly as nice as re.compile("whatever").split("somestring").

What is that good for, you may ask? Well, to do things like splitting email addresses while validating them, or in this specific case, to validate SPF mechanisms (nevermind what those are).

But hey, you can always do this (excuse me while I weep a little):

struct bstrList *re_split(const char *string, const char *pattern)
{
    int status;
    regex_t re;
    regmatch_t pmatch[20];

    if (regcomp(&re, pattern, REG_ICASE|REG_EXTENDED) != 0)
    {
        return(0);      /* Report error. */
    }

    bstring tmp=bfromcstr("");
    char *ptr=(char *)string;

    for (;;)
    {
        status = regexec(&re, ptr, (size_t)20, pmatch, 0);
        if (status==REG_NOMATCH)
        {
            break;
        }
        bcatblk (tmp,ptr,pmatch[0].rm_so);
        bconchar (tmp,0);
        bcatblk (tmp,ptr+pmatch[0].rm_so,pmatch[0].rm_eo-pmatch[0].rm_so);
        bconchar (tmp,0);
        ptr=ptr+pmatch[0].rm_eo;

    }
    regfree(&re);
    bcatblk (tmp,ptr,strlen(string)-(ptr-string));
    struct bstrList *l= bsplit(tmp,0);
    return l;
}

And that is probably wrong for some cases (and it doesn't split the exact same way as Python, but that's what unit testing is for).

I must be missing something that makes regcomp & friends nicer to use. Right? Right?

Itching.

Ok, the SPF implementation situation is kinda pathetic.

There seems to be exactly one maintained C implementation. And it's windows-only.

  • libspf's website seems to have disappeared
  • libspf2's not RFC-compliant (verified for 1.2.5) and their issue reporting system bounces.

So, I have taken the most compliant one I found whose code I can actually follow (that would be the python one) and am reimplementing it in C (using bstrlib and libdjbdns).

It will probably not come to a good end, but hey, it may work ;-)