Ir al contenido principal

Ralsina.Me — El sitio web de Roberto Alsina

Publicaciones sobre programming (publicaciones antiguas, página 15)

C is not Python II.

RaSPF, my C port of PySPF, is pret­ty much func­tion­al right now.

Here's what I mean:

  • It pass­es 75 in­­ter­­nal unit tests (ok, 74 , but that one is ar­guable).

  • It pass­es 137 of 145 tests of the SPF of­­fi­­cial test suit­­e.

  • It agrees with PySPF in 181 of the 183 cas­es of the lib­spf2 live DNS suit­­e.

  • It seg­­faults in none of the 326 test cas­es.

So, while there are still some cor­ner cas­es to de­bug, it's look­ing very good.

I even spent some time with val­grind to plug some leaks ( the in­ter­nal test suite runs al­most leak­less, the re­al app is a sieve ;-)

All in al­l, if I can spend a lit­tle while with it dur­ing the week, I should be able to make a re­lease that ac­tu­al­ly work­s.

Then, I can re­write my SPF plug­in for qmail, which was what sent me in this mon­th-log tan­gen­t.

As a lan­guage wars com­par­ison:

  • The sloc­­count of raspf is 2557 (or 2272 if we use the ragel gram­­mar source in­­stead of the gen­er­at­ed file)

  • The sloc­­count of PySPF is 993.

So, a 2.6:1 or 2.28:1 code ra­tio.

How­ev­er, I used 4 non-­s­tan­dard C li­braries: bstr­lib, udns, and helpers for hash­es and ex­cep­tion­s, which add an­oth­er 5794 LOC­s.

So, it could be ar­gued as a 8:1 ra­tio, too, but my C code is prob­a­bly ver­bose in ex­treme, and many C lines are not re­al­ly "log­ic" but dec­la­ra­tions and such.

Al­so, I did not write PySPF, so his code may be more con­cise, but I tried my best to copy the flow as much as pos­si­ble line-per-­line.

In short, you need to write, ac­cord­ing to this case, be­tween 2 and 8 times more code than you do in Python.

That's a bit much!

The middle path

In my pre­vi­ous post, I men­tioned how PySPF does some­thing us­ing a reg­u­lar ex­pres­sion which I could­n't eas­i­ly re­pro­duce in C.

So, I start­ed look­ing at pars­er gen­er­a­tors to use the orig­i­nal SPF RFC's gram­mar.

But that had its own prob­lem­s.... and then came ragel.

Ragel is a fi­nite state ma­chine com­pil­er, and you can use it to gen­er­ate sim­ple parsers and val­ida­tors.

The syn­tax is very sim­ple, the re­sults are pow­er­ful, and here's the main chunk of code that lets you parse a SPF do­main-spec (it work­s, too!):

machine domain_spec;
name = ( alpha ( alpha | digit | '-' | '_' | '.' )* );
macro_letter = 's' | 'l' | 'o' | 'd' | 'i' | 'p' | 'h' | 'c' | 'r' | 't';
transformers = digit* 'r'?;
delimiter = '.' | '-' | '+' | ',' | '|' | '_' | '=';
macro_expand = ( '%{' macro_letter transformers delimiter* '}' ) |
               '%%' | '%_' | '%-';
toplabel = ( alnum* alpha alnum* ) |
           ( alnum{1,} '-' ( alnum | '-' )* alnum );
domain_end = ( '.' toplabel '.'? ) | macro_expand;
macro_literal = 0x21 .. 0x24 | 0x26 .. 0x7E;
macro_string = ( macro_expand | macro_literal )*;
domain_spec := macro_string domain_end 0 @{ res = 1; };

And in fac­t, it's sim­pler than the AB­NF gram­mar used in the RFC:

name             = ALPHA *( ALPHA / DIGIT / "-" / "_" / "." )
macro-letter     = "s" / "l" / "o" / "d" / "i" / "p" / "h" /
                   "c" / "r" / "t"
transformers     = *DIGIT [ "r" ]
delimiter        = "." / "-" / "+" / "," / "/" / "_" / "="
macro-expand     = ( "%{" macro-letter transformers *delimiter "}" )
                   / "%%" / "%_" / "%-"
toplabel         = ( *alphanum ALPHA *alphanum ) /
                   ( 1*alphanum "-" *( alphanum / "-" ) alphanum )
domain-end       = ( "." toplabel [ "." ] ) / macro-expand
macro-literal    = %x21-24 / %x26-7E
macro-string     = *( macro-expand / macro-literal )
domain-spec      = macro-string domain-end

So, thumbs up for ragel!

Up­date:

  • The code looks very bad on python or agre­­ga­­tors.

  • This piece of code alone fixed 20 test cas­es from the SPF suit­­e, and now on­­ly 8 fail. Neat!

This can't be good

Work­ing on my SPF li­brary, I ran in­to a prob­lem. I need­ed to val­i­date a spe­cif­ic el­e­men­t, and the python code is a lit­tle hairy (it splits based on a large reg­ex­p, and it's tricky to con­vert to C).

So, I asked, and was told, maybe you should start from the RFC's gram­mar.

Ok. I am not much in­to gram­mars and parser­s, but what the heck. So I check it. It's a AB­NF gram­mar.

So, I look for the ob­vi­ous thing: a AB­NF pars­er gen­er­a­tor.

There are very few of those, and none of them seems very solid, which is scary, be­cause al­most all the RFC's de­fine ev­ery­thing in terms of AB­NF (ex­cept for some that do worse, and de­fine in pros­e. Did you know there is no for­mal, ver­i­fi­able def­i­ni­tion of what an Ipv6 ad­dress looks like?).

So, af­ter hours of googling...

Any­one knows a good AB­NF pars­er gen­er­a­tor? I am try­ing with ab­n­f2c but it's not strict enough (I am get­ting a pars­er that does­n't work).

Any­one knows why those very im­por­tant doc­u­ments that rule how most of us make a liv­ing/­work/have fun are so ... hazy?

My SPF library kinda works

RaSPF, my at­tempt­ed port of PySPF to C is now at a very spe­cial point in its life:

The pro­vid­ed CLI ap­pli­ca­tion can check SPF records and tell you what you should do with them!

Here's an ex­am­ple:

[ralsina@monty build]$ ./raspfquery --ip=192.0.2.1 --sender=03.spf1-test.mailzone.com --helo=03.spf1-test.mailzone.com
Checking SPF with:

sender: 03.spf1-test.mailzone.com
helo:   03.spf1-test.mailzone.com
ip:     192.0.2.1


response:       softfail
code:           250
explanation:    domain owner discourages use of this host

Is that cor­rec­t? Ap­par­ent­ly yes!

[ralsina@monty pyspf-2.0.2]$ python spf.py 192.0.2.1 03.spf1-test.mailzone.com 03.spf1-test.mailzone.com
('softfail', 250, 'domain owner discourages use of this host')

Is it use­ful? Sure­ly you jest!

There are still the fol­low­ing prob­lem­s:

  • The mem­o­ry man­age­­ment is un­ex­is­­tant

  • I need to hack a way to run the of­­fi­­cial SPF test suite so I can see how well it works and that it works ex­ac­t­­ly as PySPF

  • It prob­a­bly will seg­­fault on many places

  • I am chang­ing the er­ror han­dling to be ex­­cep­­tion-based, thanks to EX­CC

  • The IPv6 sup­­port is be­tween iffy and not there

  • There is no sup­­port for SPF (type 99) DNS record­s, on­­ly TXT records (need to hack the udns li­brary)

But re­al­ly, this should be about 60% of the work, and it does work for some cas­es, which is more than I re­al­ly ex­pect­ed at the be­gin­ning.

Here's the whole source code of the sam­ple ap­pli­ca­tion (ex­cept for CLI op­tion pro­cess­ing):

spf_init();
spf_response r=spf_check(ip,sender,helo,0,0);
printf ("\nresponse:\t%s\ncode:\t\t%d\nexplanation:\t\t%s\n",
        r.response,r.code,r.explanation);

C is not Python

I am port­ing pyspf to C (long sto­ry, and I am stupid for try­ing). But of course, C is not python.

So you don't have anything nearly as nice as re.­com­pile("what­ev­er").s­plit("­somestring").

What is that good for, you may ask? Well, to do things like split­ting email ad­dress­es while val­i­dat­ing them, or in this spe­cif­ic case, to val­i­date SPF mech­a­nisms (n­ev­er­mind what those are).

But hey, you can al­ways do this (ex­cuse me while I weep a lit­tle):

struct bstrList *re_split(const char *string, const char *pattern)
{
    int status;
    regex_t re;
    regmatch_t pmatch[20];

    if (regcomp(&re, pattern, REG_ICASE|REG_EXTENDED) != 0)
    {
        return(0);      /* Report error. */
    }

    bstring tmp=bfromcstr("");
    char *ptr=(char *)string;

    for (;;)
    {
        status = regexec(&re, ptr, (size_t)20, pmatch, 0);
        if (status==REG_NOMATCH)
        {
            break;
        }
        bcatblk (tmp,ptr,pmatch[0].rm_so);
        bconchar (tmp,0);
        bcatblk (tmp,ptr+pmatch[0].rm_so,pmatch[0].rm_eo-pmatch[0].rm_so);
        bconchar (tmp,0);
        ptr=ptr+pmatch[0].rm_eo;

    }
    regfree(&re);
    bcatblk (tmp,ptr,strlen(string)-(ptr-string));
    struct bstrList *l= bsplit(tmp,0);
    return l;
}

And that is prob­a­bly wrong for some cas­es (and it does­n't split the ex­act same way as Python, but that's what unit test­ing is for).

I must be miss­ing some­thing that makes reg­comp & friends nicer to use. Right? Right?


Contents © 2000-2023 Roberto Alsina