Creating PDF Reports with Python and Restructured Text

This article is inspired by a thread in the PyAr mailing list. Here´s the original question (translated):

From: Daniel Padula

I need some advice. I need to create an application for schools that takes student data (personal information, subjects, grades, etc) and produces their grade report. I need to create a printed copy, and keep a historic record.

As a first step, I thought on generating them in PDF via reportlab, but I want opinions. For example, I can generate the PDF, print it and regenerate it if I need to reprint it. What other optins do you see? It's basically text with tables. Reportlab? LaTeX? Some other tool?

To this I replied I suggested Restructured Text which if you follow my blog should surprise noone at all ;-)

In this story I will try to bring together all the pieces to turn a chunk of python data into a nice PDF report. Hope it´s useful for someone!

Why not use reportlab directly?

Here's an example I posted in that thread: how to create a PDF with two paragraphs, using restructured text:

This is a paragraph. It has several lines, but what it says does not matter.
I can press enter anywhere, because
it ends only on a blank
line. Like this.

This is another paragraph.

And here's what you need to do in reportlab:

# -*- coding: utf-8 -*-
from reportlab.platypus import SimpleDocTemplate, Paragraph, Spacer
from reportlab.lib.styles import getSampleStyleSheet
from reportlab.lib.units import inch
styles = getSampleStyleSheet()
def go():
  doc = SimpleDocTemplate("phello.pdf")
  Story = [Spacer(1,2*inch)]
  style = styles["Normal"]
  p = Paragraph('''This is a paragraph. It has several lines, but what it says does not matter.
I can press enter anywhere, because
it ends when the string ends.''', style)
  Story.append(p)
  p = Paragraph('''This is another paragraph.''', style)
  Story.append(p)
  Story.append(Spacer(1,0.2*inch))
  doc.build(Story)

go()

Of course, you could write a program that takes text separated in paragraphs as its input, and creates the reportlab Paragraph elements, puts them in the Story and builds the document.... but then you are reinventing the restructured text parser, only worse!

Restructured text is data. Reportlab programs are code. Data is easier to generate than code.

So, how do you do a report?

You create a file with the data in it, process it via one of the many rst->pdf paths (I suggest my rst2pdf script, but feel free to use the other 9 alternatives).

Suppose you have the following data:

frobtimes = [[1,3],[3,5],[9,8]]

And you want to produce this report:

Frobniz performance
===================

* 1 frobniz: 3 seconds

* 3 frobniz: 5 seconds

* 9 frobniz: 8 seconds

You could do it this way:

print '''Frobniz performance
==================='''

for ft in frobtimes:
  print '* %d frobniz: %d seconds\n'%(ft[0],ft[1])
And it will work. However, this means you are writing code again! This time, you are reinventing templating
languages.

What you want is to use, say, Mako (or whatever). It's going to be better than your homebrew solution anyway. Here's the template for the report:

${title('Frobniz Performance')}

% for ft in frobtimes:
* ${ft[0]} frobniz: $ft[1] seconds

% endfor

This uses a function title defined thus:

title=lambda(text): text+'\n'+'='\*len(text)+'\n\n'

You could generalize it to support multiple heading levels:

title=lambda(text,level): text+'\n'+'=-~_#%^'[level]*len(text)+'\n\n'

Trickier: tables

One very common feature of reports is tables. In fact, it would be more natural to present our frobniz report as a table. The bad news is how tables look like in restructured text:

+---------+----------------+
| Frobniz | Time (seconds) |
+---------+----------------+
|        1|              3 |
+---------+----------------+
|        3|              5 |
+---------+----------------+
|        9|              8 |
+---------+----------------+

Which is very pretty, but not exactly trivial to generate. But don't worry, there is a simple solution for this, too: CSV tables:

.. csv-table:: Frobniz time measurements
   :header: Frobniz,Time(seconds)

   1,3
   3,5
   9,8

Produces this:

Frobniz time measurements
Frobniz Time(seconds)
1 3
3 5
9 8

And of course, there is python's csv module if you want to be fancy and avoid trouble with delimiters, escaping and so on:

def table(title,header,data):
  head=StringIO()
  body=StringIO()
  csv_writer = csv.writer(head, dialect='excel')
  csv_writer.writerow(header)

  head=´:header: %s´head.getvalue()

  csv_writer = csv.writer(body, dialect='excel')
  for row in data:
    csv_writer.writerow(row)
  body=body.getvalue()

  return '''.. csv-table:: %s
     :header: %s

     %s
     '''%(title,head,body)

will produce neat, ready for use, csv table directives for restructured text.

How would it work?

This python program is really generic. All you need is for it to match a template (an external text file), with data in the form of a bunch of python variables.

But how do we get the data? Well, from a database, usually. But it can come from anywhere. You could be making a report about your del.icio.us bookmarks, or about files in a folder, this is really generic stuff.

What would I use to get the data? I would use JSON in the middle. I would make my report generator take the following arguments:

  1. A mako template name.
  2. A JSON data file.

That way, the program will be completely generic.

So, put all this together, and there's the superduper magical report generator.

Once you get rst, pass it through something to create PDFs, but store only the rst, which is (almost) plain text, searchable, easy to store, and much smaller.

I don't expect such a report generator to be over 50 lines of code, including comments.

Missing pieces

  • While restructured text is almost plain text, there are special characters, which you should escape. That is left as an exercise to the reader ;-)
  • Someone should really write this thing ;-)

Comments

Comments powered by Disqus