Skip to main content

Ralsina.Me — Roberto Alsina's website

This friday will see a new rst2pdf release

Fol­low­ing my new pol­i­cy of one re­lease ev­ery fri­day, in 6 days you will see a rst2pdf re­lease. But not any re­lease: a great re­lease.

What will be new?

  • Sup­­port for page num­ber/­sec­­tion names/­sec­­tion num­bers in head­­ers and foot­er­s.

  • Cus­­tom in­­ter­pret­ed text roles (that means in­­­line styling ;-)

  • Stylesheets de­fined in ex­ter­­nal files. The syn­­tax is JSON which may look a bit strange, but it works great.

  • A Man­u­al!

  • Easy True Type font em­bed­d­ing.

  • May­be: syn­­tax high­­­light­ing di­rec­­tive via pyg­­ments. I know I could make it work us­ing the Im­age­­For­­mat­ter, but then you can't copy the code. There is a do­cu­til­s-sand­box project that does ex­ac­t­­ly what I wan­t.

I in­tend to call this re­lease 0.3.0, but maybe I will jump high­er, since there is not much more left to im­ple­men­t.

Some more rst2pdf love, time-based releases of my code

Since re­vi­sion #17_ you can dis­play Page num­bers in head­ers and foot­ers (on­ly!) by us­ing this syn­tax:

.. header::

   This is the header. Page ###Page###

This is the content

.. footer::

   This is the footer. Page ###Page###

It has some is­sues if your page num­ber is big­ger than 99999999999 or your head­er/­foot­er is a lit­tle longer than one line when us­ing the place­hold­er, be­cause the space re­quired is cal­cu­lat­ed with the place­hold­er in­stead of with the num­ber, but those are re­al­ly mar­gin­al cas­es.

Next in line, a de­cent way to de­fine cus­tom stylesheet­s.


As for "time-based re­leas­es", I in­tend to re­lease a new ver­sion of some­thing ev­ery fri­day.

Since I have about a dozen projects in dif­fer­ent stages of us­abil­i­ty, I ex­pect this will push me a bit more to­wards show­ing this stuff in­stead of it rot­ting in my hard drive and un­known svn re­pos.

Creating PDF Reports with Python and Restructured Text

This ar­ti­cle is in­spired by a thread in the PyAr mail­ing list. Here´s the orig­i­nal ques­tion (trans­lat­ed):

From: Daniel Padu­la

I need some ad­vice. I need to cre­ate an ap­pli­ca­tion for schools that takes stu­dent da­ta (per­son­al in­for­ma­tion, sub­ject­s, grades, etc) and pro­duces their grade re­port. I need to cre­ate a print­ed copy, and keep a his­toric record.

As a first step, I thought on gen­er­at­ing them in PDF via re­port­lab, but I want opin­ion­s. For ex­am­ple, I can gen­er­ate the PDF, print it and re­gen­er­ate it if I need to re­print it. What oth­er optins do you see? It's ba­si­cal­ly text with ta­bles. Re­port­lab? La­TeX? Some oth­er tool?

To this I replied I sug­gest­ed Re­struc­tured Text which if you fol­low my blog should sur­prise noone at all ;-)

In this sto­ry I will try to bring to­geth­er all the pieces to turn a chunk of python da­ta in­to a nice PDF re­port. Hope it´s use­ful for some­one!

Why not use reportlab directly?

Here's an ex­am­ple I post­ed in that thread: how to cre­ate a PDF with two para­graph­s, us­ing re­struc­tured tex­t:

This is a paragraph. It has several lines, but what it says does not matter.
I can press enter anywhere, because
it ends only on a blank
line. Like this.

This is another paragraph.

And here's what you need to do in re­port­lab:

# -*- coding: utf-8 -*-
from reportlab.platypus import SimpleDocTemplate, Paragraph, Spacer
from reportlab.lib.styles import getSampleStyleSheet
from reportlab.lib.units import inch
styles = getSampleStyleSheet()
def go():
  doc = SimpleDocTemplate("phello.pdf")
  Story = [Spacer(1,2*inch)]
  style = styles["Normal"]
  p = Paragraph('''This is a paragraph. It has several lines, but what it says does not matter.
I can press enter anywhere, because
it ends when the string ends.''', style)
  Story.append(p)
  p = Paragraph('''This is another paragraph.''', style)
  Story.append(p)
  Story.append(Spacer(1,0.2*inch))
  doc.build(Story)

go()

Of course, you could write a pro­gram that takes text sep­a­rat­ed in para­graphs as its in­put, and cre­ates the re­port­lab Para­graph el­e­ments, puts them in the Sto­ry and builds the doc­u­men­t.... but then you are rein­vent­ing the re­struc­tured text parser, on­ly worse!

Re­struc­tured text is da­ta. Re­port­lab pro­grams are code. Da­ta is eas­i­er to gen­er­ate than code.

So, how do you do a report?

You cre­ate a file with the da­ta in it, process it via one of the many rst->pdf paths (I sug­gest my rst2pdf scrip­t, but feel free to use the oth­er 9 al­ter­na­tives).

Sup­pose you have the fol­low­ing data:

frobtimes = [[1,3],[3,5],[9,8]]

And you want to pro­duce this re­port:

Frobniz performance
===================

* 1 frobniz: 3 seconds

* 3 frobniz: 5 seconds

* 9 frobniz: 8 seconds

You could do it this way:

print '''Frobniz performance
==================='''

for ft in frobtimes:
  print '* %d frobniz: %d seconds\n'%(ft[0],ft[1])
And it will work. However, this means you are writing code again! This time, you are reinventing templating

lan­guages.

What you want is to use, say, Mako (or what­ev­er). It's go­ing to be bet­ter than your home­brew so­lu­tion any­way. Here's the tem­plate for the re­port:

${title('Frobniz Performance')}

% for ft in frobtimes:
* ${ft[0]} frobniz: $ft[1] seconds

% endfor

This uses a function title defined thus:

title=lambda(text): text+'\n'+'='\*len(text)+'\n\n'

You could gen­er­al­ize it to sup­port mul­ti­ple head­ing lev­el­s:

title=lambda(text,level): text+'\n'+'=-~_#%^'[level]*len(text)+'\n\n'

Trickier: tables

One very com­mon fea­ture of re­ports is ta­bles. In fac­t, it would be more nat­u­ral to present our frob­niz re­port as a ta­ble. The bad news is how ta­bles look like in re­struc­tured tex­t:

+---------+----------------+
| Frobniz | Time (seconds) |
+---------+----------------+
|        1|              3 |
+---------+----------------+
|        3|              5 |
+---------+----------------+
|        9|              8 |
+---------+----------------+

Which is very pret­ty, but not ex­act­ly triv­ial to gen­er­ate. But don't wor­ry, there is a sim­ple so­lu­tion for this, too: CSV ta­bles:

.. csv-table:: Frobniz time measurements
   :header: Frobniz,Time(seconds)

   1,3
   3,5
   9,8

Pro­duces this:

Frobniz time measurements

Frob­niz

Time(sec­ond­s)

1

3

3

5

9

8

And of course, there is python's csv mod­ule if you want to be fan­cy and avoid trou­ble with de­lim­iter­s, es­cap­ing and so on:

def table(title,header,data):
  head=StringIO()
  body=StringIO()
  csv_writer = csv.writer(head, dialect='excel')
  csv_writer.writerow(header)

  head=´:header: %s´head.getvalue()

  csv_writer = csv.writer(body, dialect='excel')
  for row in data:
    csv_writer.writerow(row)
  body=body.getvalue()

  return '''.. csv-table:: %s
     :header: %s

     %s
     '''%(title,head,body)

will pro­duce neat, ready for use, csv ta­ble di­rec­tives for re­struc­tured tex­t.

How would it work?

This python pro­gram is re­al­ly gener­ic. All you need is for it to match a tem­plate (an ex­ter­nal text file), with da­ta in the form of a bunch of python vari­ables.

But how do we get the data? Well, from a database, usu­al­ly. But it can come from any­where. You could be mak­ing a re­port about your del.i­cio.us book­mark­s, or about files in a fold­er, this is re­al­ly gener­ic stuff.

What would I use to get the data? I would use JSON in the mid­dle. I would make my re­port gen­er­a­tor take the fol­low­ing ar­gu­ments:

  1. A mako tem­­plate name.

  2. A JSON da­­ta file.

That way, the pro­gram will be com­plete­ly gener­ic.

So, put all this to­geth­er, and there's the su­perduper mag­i­cal re­port gen­er­a­tor.

Once you get rst, pass it through some­thing to cre­ate PDF­s, but store on­ly the rst, which is (al­most) plain tex­t, search­able, easy to store, and much small­er.

I don't ex­pect such a re­port gen­er­a­tor to be over 50 lines of code, in­clud­ing com­ments.

Missing pieces

  • While re­struc­­tured text is al­­most plain tex­t, there are spe­­cial char­ac­ter­s, which you should es­­­cape. That is left as an ex­er­­cise to the read­­er ;-)

  • Some­one should re­al­­ly write this thing ;-)

Giving rst2pdf some love

Be­cause of a thread in the PyAr list about gen­er­at­ing re­ports from Python, I sug­gest­ed us­ing ReST and my rst2pdf scrip­t.

This caused a few things:

  1. I de­­cid­ed it's a pret­­ty de­­cent piece of code, and it de­serves a re­lease. Mak­ing a re­lease means I need­ed to fix the most em­bar­ras­ing pieces of it. So...

  2. Im­­ple­­men­t­ed the class di­rec­­tive, so it can have cus­­tom para­­graph styles with very lit­­tle ef­­fort.

  3. Did prop­er com­­mand line pars­ing.

  4. Did prop­er se­­tup­­tools script

  5. Up­­load­­ed to PyPI

  6. Cre­at­ed a re­lease in Google Code.

So, if you want the sim­plest way to gen­er­ate PDF files from a pro­gram in the en­tire python­ic uni­verse... give it a look.

Lessons learned in a month of hobby programming

A lit­tle over a month ago, on Ju­ly 15th, I opened a Google Code project called uRSSus. Here's the com­mit. My goal was to try build­ing a desk­top ap­pli­ca­tion like if I were build­ing a web ap­pli­ca­tion, us­ing a OR­M, tem­plat­ing, gener­ic views, and oth­er things.

The first thing I learned is that it was more fun to just write the ap­pli­ca­tion and see it grow than spend­ing time writ­ing the frame­work need­ed to do what I want­ed, so I just kept the OR­M, and the rest is pret­ty tra­di­tion­al code.

The sec­ond thing I learned is that for a hob­by­ist pro­gram­mer, this is a gold­en age. I am not ex­act­ly an awe­some pro­gram­mer my­self, and with to­day´s tool­s, I could al­most wish my app in­to ex­is­tence. When I start­ed pro­gram­ming on a PC, I had to swap flop­pies to change from the IDE to the com­pil­er [1]. And if I made a mis­take, the com­put­er crashed. No, not the pro­gram. The com­put­er crashed.

Now? I get a pret­ty di­alog, a link to the po­si­tion, a stack dump, etc, etc, etc. Not miss­ing the old days at al­l.

An­oth­er way this is a gold­en age is that there is a lot of code out there. I lit­er­al­ly had to learn my code from book­s. I first "got" C by read­ing the help for a pi­rat­ed copy of Au­todesk An­i­ma­tor's POCO ex­ten­sion lan­guage. There were no col­lec­tions of code I could look at and learn. There were not even any large li­braries of code I could legal­ly use!

And that´s an­oth­er rea­son why this is a gold­en age: Open Source and Free Soft­ware. You re­al­ly can be a pro­gram­mer just by will­ing it and ef­fort. You will not lack tool­s, you will find users (if you are good), you will find helpers (if you are luck­y), you will find free in­fra­struc­ture (svn re­pos, free wik­is, free file host­ing, free ev­ery­thing), you will find li­braries you can use!.

The third thing I learned is that Python does come with bat­ter­ies in­clud­ed. Many things that would be an­noy­ing ef­fort in oth­er lan­guages are just there, ready to be used. Add the in­ter­net, and it´s a Mr. Fu­sion in­stead of a bat­tery.

The ap­pli­ca­tion I de­vel­oped is a News ag­gre­ga­tor and thanks to Mark Pil­grim I had Feed Pars­er and thanks to Troll Tech (Now Noki­a) I had Qt for the UI, and many many oth­er things. I could fo­cus on ap­pli­ca­tion log­ic, not on pars­ing and draw­ing.

The fourth thing I learned is that a month is a long time when you have pro­duc­tive tool­s. Urssus (that's my ap­pli­ca­tion) was func­tion­al (but aw­ful) in a day or two. It was not aw­ful in 2 week­s. It was pret­ty good in 3.In a mon­th? Down­load it and see for your­self, I like it, the SVN ver­sion is much bet­ter most of the time, try re­vi­sion 619 ;-)

The fifth thing I learned is that Python per­for­mance is good enough. I don´t see much per­for­mance dif­fer­ence be­tween uRSSus and, say, Akre­ga­tor, which is C++, ex­cept on places which are ob­vi­ous­ly bro­ken. Sure, the data­base is C, the UI tool­kit is C++... they are all black box­es to me here. I code Python. My pieces do well.

The last thing I learned is that I can still code free soft­ware. I had not writ­ten a use­ful/us­able large free soft­ware ap­pli­ca­tion in per­haps 8 years. I am 36.9 years old... ex­cuse me if I feel mid­dle-aged, sur­round­ed by young­sters which are faster, more ded­i­cat­ed and ac­tu­al­ly have free time.

Be­cause of the pro­duc­tiv­i­ty of the tool­s, I man­aged to code just a cou­ple of hours a day for the first week­s, and progress was still good, so I did­n´t get dis­cour­aged, which is the worst en­e­my of free soft­ware.

It has been a fun ex­per­i­men­t, hope­ful­ly it will be a fun on­go­ing hob­by.


Contents © 2000-2024 Roberto Alsina