Posts about python (old posts, page 9)

2007-03-24 09:16

A little project, son of BartleBlog

I have been posting this blog using PyDS for over 4 years now. Sadly, the PyDS author seems to have abandoned it. Which is sad, because it's nifty software.

However, keeping it working is getting harder every year, and I don't expect to be able to do it soon.

Also, the data is in a Metakit database, which is the most annoying DB ever (no real schema! columnar instead of record oriented! gouge my eyes with a breadstick!)

So, since I have all the data, and my blogging needs are modest, and no tool does exactly what I want, I decided to write my own.

I could make it a web app, maybe using TurboGears, but what the heck, I haven't done a decent GUI app in ... ok, arguably, I never have done a decent one, and my PyQt4 needs some work, and I am kinda in a groove for actually finishing things lately (I am rather proud of RaSPF).

And I have a neat name (BartleBlog) reserved from another aborted app.

So, here's the mandatory screenshot after a couple hours hacking:

bartleblog

And here are the goals:

  • Generate static pages, so it can be used by anyone with a little web space (I am a gipsy)
  • Simple templating (Using cherrytemplate right now, but should be modular)
  • Restructured Text as input mechanism (again, modular)
  • Good support for code snippets
  • Should support static pages (like the ones I have in the Stories link)
  • Integrate with Flickr for images
  • Integrate "chunks" in the templating, where you can do things like setting the right Haloscan comment/trackback links easily
  • Simple category mechanism, with a regexp-based autotagger without creating per-category copies of everything.
  • RSS feed generation, global and per-category.
  • A way to import all my PyDS blog (and maybe my older advogato things)
  • Use sqlite and SQLObject for sane storage.

So far, it's doing some things, I can import, edit, save (by instant application, there is no "save" here).

I can't yet generate the site, or create a new post, and it should take months to make it useful, but let's see how it goes.

2007-03-06 14:23

C is not Python II.

RaSPF, my C port of PySPF, is pretty much functional right now.

Here's what I mean:

  • It passes 75 internal unit tests (ok, 74 , but that one is arguable).
  • It passes 137 of 145 tests of the SPF official test suite.
  • It agrees with PySPF in 181 of the 183 cases of the libspf2 live DNS suite.
  • It segfaults in none of the 326 test cases.

So, while there are still some corner cases to debug, it's looking very good.

I even spent some time with valgrind to plug some leaks ( the internal test suite runs almost leakless, the real app is a sieve ;-)

All in all, if I can spend a little while with it during the week, I should be able to make a release that actually works.

Then, I can rewrite my SPF plugin for qmail, which was what sent me in this month-log tangent.

As a language wars comparison:

  • The sloccount of raspf is 2557 (or 2272 if we use the ragel grammar source instead of the generated file)
  • The sloccount of PySPF is 993.

So, a 2.6:1 or 2.28:1 code ratio.

However, I used 4 non-standard C libraries: bstrlib, udns, and helpers for hashes and exceptions, which add another 5794 LOCs.

So, it could be argued as a 8:1 ratio, too, but my C code is probably verbose in extreme, and many C lines are not really "logic" but declarations and such.

Also, I did not write PySPF, so his code may be more concise, but I tried my best to copy the flow as much as possible line-per-line.

In short, you need to write, according to this case, between 2 and 8 times more code than you do in Python.

That's a bit much!

2007-03-04 21:10

The middle path

In my previous post, I mentioned how PySPF does something using a regular expression which I couldn't easily reproduce in C.

So, I started looking at parser generators to use the original SPF RFC's grammar.

But that had its own problems.... and then came ragel.

Ragel is a finite state machine compiler, and you can use it to generate simple parsers and validators.

The syntax is very simple, the results are powerful, and here's the main chunk of code that lets you parse a SPF domain-spec (it works, too!):

machine domain_spec;
name = ( alpha ( alpha | digit | '-' | '_' | '.' )* );
macro_letter = 's' | 'l' | 'o' | 'd' | 'i' | 'p' | 'h' | 'c' | 'r' | 't';
transformers = digit* 'r'?;
delimiter = '.' | '-' | '+' | ',' | '|' | '_' | '=';
macro_expand = ( '%{' macro_letter transformers delimiter* '}' ) |
               '%%' | '%_' | '%-';
toplabel = ( alnum* alpha alnum* ) |
           ( alnum{1,} '-' ( alnum | '-' )* alnum );
domain_end = ( '.' toplabel '.'? ) | macro_expand;
macro_literal = 0x21 .. 0x24 | 0x26 .. 0x7E;
macro_string = ( macro_expand | macro_literal )*;
domain_spec := macro_string domain_end 0 @{ res = 1; };

And in fact, it's simpler than the ABNF grammar used in the RFC:

name             = ALPHA *( ALPHA / DIGIT / "-" / "_" / "." )
macro-letter     = "s" / "l" / "o" / "d" / "i" / "p" / "h" /
                   "c" / "r" / "t"
transformers     = *DIGIT [ "r" ]
delimiter        = "." / "-" / "+" / "," / "/" / "_" / "="
macro-expand     = ( "%{" macro-letter transformers *delimiter "}" )
                   / "%%" / "%_" / "%-"
toplabel         = ( *alphanum ALPHA *alphanum ) /
                   ( 1*alphanum "-" *( alphanum / "-" ) alphanum )
domain-end       = ( "." toplabel [ "." ] ) / macro-expand
macro-literal    = %x21-24 / %x26-7E
macro-string     = *( macro-expand / macro-literal )
domain-spec      = macro-string domain-end

So, thumbs up for ragel!

Update:

  • The code looks very bad on python or agregators.
  • This piece of code alone fixed 20 test cases from the SPF suite, and now only 8 fail. Neat!

2007-03-04 14:21

This can't be good

Working on my SPF library, I ran into a problem. I needed to validate a specific element, and the python code is a little hairy (it splits based on a large regexp, and it's tricky to convert to C).

So, I asked, and was told, maybe you should start from the RFC's grammar.

Ok. I am not much into grammars and parsers, but what the heck. So I check it. It's a ABNF grammar.

So, I look for the obvious thing: a ABNF parser generator.

There are very few of those, and none of them seems very solid, which is scary, because almost all the RFC's define everything in terms of ABNF (except for some that do worse, and define in prose. Did you know there is no formal, verifiable definition of what an Ipv6 address looks like?).

So, after hours of googling...

Anyone knows a good ABNF parser generator? I am trying with abnf2c but it's not strict enough (I am getting a parser that doesn't work).

Anyone knows why those very important documents that rule how most of us make a living/work/have fun are so ... hazy?

2007-03-01 13:46

My SPF library kinda works

RaSPF, my attempted port of PySPF to C is now at a very special point in its life:

The provided CLI application can check SPF records and tell you what you should do with them!

Here's an example:

[[email protected] build]$ ./raspfquery --ip=192.0.2.1 --sender=03.spf1-test.mailzone.com --helo=03.spf1-test.mailzone.com
Checking SPF with:

sender: 03.spf1-test.mailzone.com
helo:   03.spf1-test.mailzone.com
ip:     192.0.2.1


response:       softfail
code:           250
explanation:    domain owner discourages use of this host

Is that correct? Apparently yes!

[[email protected] pyspf-2.0.2]$ python spf.py 192.0.2.1 03.spf1-test.mailzone.com 03.spf1-test.mailzone.com
('softfail', 250, 'domain owner discourages use of this host')

Is it useful? Surely you jest!

There are still the following problems:

  • The memory management is unexistant
  • I need to hack a way to run the official SPF test suite so I can see how well it works and that it works exactly as PySPF
  • It probably will segfault on many places
  • I am changing the error handling to be exception-based, thanks to EXCC
  • The IPv6 support is between iffy and not there
  • There is no support for SPF (type 99) DNS records, only TXT records (need to hack the udns library)

But really, this should be about 60% of the work, and it does work for some cases, which is more than I really expected at the beginning.

Here's the whole source code of the sample application (except for CLI option processing):

spf_init();
spf_response r=spf_check(ip,sender,helo,0,0);
printf ("\nresponse:\t%s\ncode:\t\t%d\nexplanation:\t\t%s\n",
        r.response,r.code,r.explanation);

2007-02-13 11:56

C is not Python

I am porting pyspf to C (long story, and I am stupid for trying). But of course, C is not python.

So you don't have anything nearly as nice as re.compile("whatever").split("somestring").

What is that good for, you may ask? Well, to do things like splitting email addresses while validating them, or in this specific case, to validate SPF mechanisms (nevermind what those are).

But hey, you can always do this (excuse me while I weep a little):

struct bstrList *re_split(const char *string, const char *pattern)
{
    int status;
    regex_t re;
    regmatch_t pmatch[20];

    if (regcomp(&re, pattern, REG_ICASE|REG_EXTENDED) != 0)
    {
        return(0);      /* Report error. */
    }

    bstring tmp=bfromcstr("");
    char *ptr=(char *)string;

    for (;;)
    {
        status = regexec(&re, ptr, (size_t)20, pmatch, 0);
        if (status==REG_NOMATCH)
        {
            break;
        }
        bcatblk (tmp,ptr,pmatch[0].rm_so);
        bconchar (tmp,0);
        bcatblk (tmp,ptr+pmatch[0].rm_so,pmatch[0].rm_eo-pmatch[0].rm_so);
        bconchar (tmp,0);
        ptr=ptr+pmatch[0].rm_eo;

    }
    regfree(&re);
    bcatblk (tmp,ptr,strlen(string)-(ptr-string));
    struct bstrList *l= bsplit(tmp,0);
    return l;
}

And that is probably wrong for some cases (and it doesn't split the exact same way as Python, but that's what unit testing is for).

I must be missing something that makes regcomp & friends nicer to use. Right? Right?

2007-02-12 23:31

Any regex wizard reading this?

If so, what is the C POSIX regex (you know regcomp & friends) equivalent of this python regular expresion:

re.compile(r'^([a-z][a-z0-9_\-\.]*)=', re.IGNORECASE)

Because it sure isn't this:

regcomp(&re,"^([a-z][a-z0-9_\-\.]*)=",REG_ICASE)

I have been playing with it for two hours and am bored :-)

2007-02-06 23:05

Itching.

Ok, the SPF implementation situation is kinda pathetic.

There seems to be exactly one maintained C implementation. And it's windows-only.

  • libspf's website seems to have disappeared
  • libspf2's not RFC-compliant (verified for 1.2.5) and their issue reporting system bounces.

So, I have taken the most compliant one I found whose code I can actually follow (that would be the python one) and am reimplementing it in C (using bstrlib and libdjbdns).

It will probably not come to a good end, but hey, it may work ;-)

2006-11-17 11:04

To the other three guys (or gals)....

... who own a HP Jornada 720 and are using Opie on it and they have the spanish/latin-american keyboard... here is your keymap.

I will write something about how to get Linux going right on it soon, but here's the status report, 48 hours in.

This baby (unnamed yet) has:

  • 32MB of RAM
  • 1 GB of Flash
  • Wifi (802.11b pcmcia) + IRDA + Ethernet (pcmcia) + Anything once I find a 16-bit pcmcia-USB card (anyone has a spare and wants to recycle it? ;-)
  • Decent battery life (6 hours use with wifi, 9 without)
  • A keyboard
  • A decent screen (640x240)
  • A decent Linux-based GUI (Opie)
  • A somewhat erratic touchscreen

So, what can I do with it:

  • Email
  • Web browsing ( With Konqueror goodness )
  • Programming (Python, even PyQt2!). They keyboard and screen are surprisingly decent.
  • eBook reading. This is the most important one. In my work, I spend a lot of time waiting. Waiting for the train to arrive, for the trip to end, for someone to come to a meeting, for the waiter to bring my meal, for stuff to compile, for stuff to download... maybe I wait 3 hours a day. So I read. And this screen (long and somewhat thin) is quite spectacular for reading. Opie-reader is pretty good.
  • MP3 and Video player (haven't used it yet). I have streaming TV at home, courtesy of CherryTV (check the links at the left). This should work great when Rosario wants to see Montecristo and I'd rather see Penn & Teller's show.
  • General PIM stuff. Although I tend to keep that stuff in my head and my phone.

The bad side:

  • The bizarre screen aspect ratio confuses many configuration dialogs.
  • Almost no game works unless you rotate the screen.
  • The keyboard configuration took a while, and is not perfect yet ( I can't make dead_acute work for some reason)
  • The extra buttons don't work (external audio recorder, and alarm light-button)
  • I can't find a way to bind the function keys to apps in Opie
  • The reset button doesn't work (it's now a hang button)
  • Suspend is not really suspend on Linux (for unavoidable hardware reasons), so it spends battery when suspended (may last 12 hours or so, I think).
  • The only way to really turn it off is to take out the battery (not as bad as it sounds).
  • If you do that, it takes about one minute to boot.

So, I am using it more as a laptop (although a really, really small one, with very, very good battery life :-) than as a PDA.

The small memory and CPU means I can't run very demanding stuff, but I never seem to do that, anyway.

And of course, the really bad thing: it's so much fun to hack with, I have trouble working!

All in all, a great toy, lots of fun, and rather useful.

2006-11-02 23:20

rst2rst works (80% or so)

What is it? A program that takes a docutils document tree ( parsed from a RST document or programatically generated) then dumps as close as I can guess to reasonable RST back.

This lets Restructured Text be a saveable data format, which is nice.

It's not done as a docutils writer. Sorry, I couldn't make that work.

What works? Most of it.

What doesn't? A dozen directives, custom interpreted text roles, and tables.

Yes, all of those are important. But the rest seems to work ok!

Look: a 804 line RST document containing almost every feature of the language, and the only difference in the generated HTML output between the original and rst2rst's is an invisible difference in continuation lines in line blocks.

[[email protected] wp]$ python rst2rst.py t1.txt > t2.txt
[[email protected] wp]$ /usr/bin/rst2html.py t1.txt t1.html ;  /usr/bin/rst2html.py t2.txt t2.html
[[email protected] wp]$ diff t1.html t2.html
468,469c468,469
< <div class="line">But I'm expecting a postal order and I can pay you back
< as soon as it comes.</div>
---
> <div class="line">But I'm expecting a postal order and I can pay you back</div>
> <div class="line">as soon as it comes.</div>
[[email protected] wp]$ wc -l t1.txt
804 t1.txt

You can get rst2rst.py and the testfile.

Anyone knows of a real docutils test suite I could borrow?

Contents © 2000-2019 Roberto Alsina