2008-08-31 23:01

rst2pdf: smartframes branch

Today I started a branch called SmartFrames. The main goal is to achieve a better text flow in the document (for example, for sidebars), and it is starting to get there, slowly.

Let's consider how ReST sidebars are rendered in the different writers.

We'll work with an ordinary lorem ipsum that has a sidebar declared just before it.

Here's HTML:


And here's LaTeX:


Each one has its good side and its bad side.

The HTML sidebar is a real sidebar, while the LaTex one is some sort of insert.

OTOH, the ragged text against the HTML sidebar is ... horrid.

So, I wanted something at least a bit better than that for rst2pdf. In the best of all possible worlds, it would be the neat text alignment of LaTex with the floating HTML sidebar.

Here's how it looks now:


There are some minor problems with the current implementation, such that the sidebar is always aligned to the top of a paragraph, and some spacing issues.

How is it done? Let me tell you: it was not trivial :-)

In fact it's pretty evil, but here's a quick explanation:

When I get a sidebar flowable, I insert a new frame in the page template where the sidebar should go, then call a framebreak, insert the "real" sidebar, a "framecutter" and another framebreak.

The framecutter is a flowable that does nothing visible, but inserts another two frames, one at the right of the sidebar with the same height, and another below the sidebar, full width.

I need to use the framecutter because I don't know the height of the sidebar until after it's drawn.

So, we now have 4 frames instead of one:

  1. The original frame, covers the whole page, but has a framebreak above the sidebar.
  2. The sidebar frame, which is very tall, but has a framebreak below the sidebar text.
  3. A beside-the-sidebar frame, short and wide, starting at the right of the sidebar.
  4. A below-the-sidebar frame, wide and tall, starting below the sidebar.

The text should flow from 1 to 3 to 4 neatly and the seams shouldn't show.

Here's a picture that MAY make it clear (there are some odd displacements: those were bugs):


So, I'm not calling it a success yet, but it is looking decent.

2008-08-30 00:10

rst2pdf is going to be one day late. But there's a good reason.

Besides everything I mentioned yesterday, today I implemented two rather important features: cascading stylesheets, and user-defined page layouts. Here is a screenshot:


That neat two-column layout is done by adding this to the stylesheet:

"pageTemplates" : {
  "firstPage": {
      "frames": [
          ["0cm", "0cm", "49%", "100%"],
          ["51%", "0cm", "49%", "100%"]

The name "firstPage" is magical right now, and there's no way to change from one template to another (yet), and until I do that, I won't be releasing.

Here's what cascading stylesheets does. Suppose you want to use A5 paper and size 12 Times New Roman fonts? Here's all the stylesheet you need:

  "pageSetup" : {
    "size": "A5",
  "fontsAlias" : {
    "stdFont": "Times-Roman",
  "styles" : [
    ["base" , {
      "fontSize": 14,
      "leading": 16

Also, you can specify as many stylesheets as you want in the command line. So you can have one that sets the paper size, one for page layout as above, one for font "sets", etc.

Neat, isn't it?

2008-08-28 23:26

Rstpdf wil be released again tomorrow. And it's a good release.

How good? Let me tell you...

  • Support for PDF table of contents
  • Section names and numbers in headers/footers
  • Compressed PDFs (or not)
  • Guess image sizes. Specially if you meant to use them in a web page and declared just ":width: 50%"
  • Gutter margin support
  • Raw directive (insert pagebreaks and vertical space manually)
  • Offers a docutils-compliant API (and another API, too)
  • Include full or partial files for code-block. That means you can extract code and show it in your document!
  • Huge code cleanup lead by Nicolas Laurance.
  • Working multilingual hyphenation. You can have a per-paragraph language and hyphenate it correctly.

2008-08-27 23:46

How pretty is rst2pdf's output? Take a look.

I am a big Alexandre Dumas fan. He's the direct ancestor of Neal Stephenson, so many of you should like him too. So I used one of his best books to try some automatic typesetting of project gutenberg texts.

No, the whole book did not convert without errors, and yes, there is some manual work in what you are about to see, but hey, take a look.

Here's a far look of the first two pages:


And here's some detail of the typsetting:


Yes, the typesetting is not really LaTeX quality, but it's not bad, either.

Compare it with the HTML version at project Gutenberg. The typesetting is a thing of beauty compared to that :-(

The image is a picture of Chateau d'If from flickr, released under Creative Commons. The title font is Scriptina, I chose it because it looks 19th century but modern.

2008-08-25 14:42

rst2pdf: release fever!

I did a release yesterday, and another today of my rst-to-pdf-without-latex tool. What's new? Here's an incomplete list:

New in 0,4

  • Fixed bullet and item lists indentation/nesting.
  • Implemented citations
  • Working links between footnotes and its references
  • Justification enabled by default
  • Fixed table bug (demo.txt works now)
  • Title and author support in PDF properties
  • Support for document title in header/footer
  • Custom page sizes in stylesheet

New in 0.3

  • Font embedding (use any True Type font in your PDFs)
  • Syntax highlighter using Pygments
  • User's manual
  • External/custom stylesheets
  • Support for page numbers in header/footer

Of course, since I said I would release something every friday, this means I need to find something else to release? ;-)

2008-08-24 13:33

rstpdf love: syntax highlighting

This mini-sprint is doing wonders for rst2pdf. Now on SVN: pygments-based syntax highlighting. Example here: rst2pdf's code, in a PDF by rst2pdf.

2008-08-24 10:11

This friday will see a new rst2pdf release

Following my new policy of one release every friday, in 6 days you will see a rst2pdf release. But not any release: a great release.

What will be new?

  • Support for page number/section names/section numbers in headers and footers.
  • Custom interpreted text roles (that means inline styling ;-)
  • Stylesheets defined in external files. The syntax is JSON which may look a bit strange, but it works great.
  • A Manual!
  • Easy True Type font embedding.
  • Maybe: syntax highlighting directive via pygments. I know I could make it work using the ImageFormatter, but then you can't copy the code. There is a docutils-sandbox project that does exactly what I want.

I intend to call this release 0.3.0, but maybe I will jump higher, since there is not much more left to implement.

2008-08-23 20:26

Some more rst2pdf love, time-based releases of my code

Since revision #17_ you can display Page numbers in headers and footers (only!) by using this syntax:

.. header::

   This is the header. Page ###Page###

This is the content

.. footer::

   This is the footer. Page ###Page###

It has some issues if your page number is bigger than 99999999999 or your header/footer is a little longer than one line when using the placeholder, because the space required is calculated with the placeholder instead of with the number, but those are really marginal cases.

Next in line, a decent way to define custom stylesheets.

As for "time-based releases", I intend to release a new version of something every friday.

Since I have about a dozen projects in different stages of usability, I expect this will push me a bit more towards showing this stuff instead of it rotting in my hard drive and unknown svn repos.

2008-08-22 23:08

Creating PDF Reports with Python and Restructured Text

This article is inspired by a thread in the PyAr mailing list. Here´s the original question (translated):

From: Daniel Padula

I need some advice. I need to create an application for schools that takes student data (personal information, subjects, grades, etc) and produces their grade report. I need to create a printed copy, and keep a historic record.

As a first step, I thought on generating them in PDF via reportlab, but I want opinions. For example, I can generate the PDF, print it and regenerate it if I need to reprint it. What other optins do you see? It's basically text with tables. Reportlab? LaTeX? Some other tool?

To this I replied I suggested Restructured Text which if you follow my blog should surprise noone at all ;-)

In this story I will try to bring together all the pieces to turn a chunk of python data into a nice PDF report. Hope it´s useful for someone!

Why not use reportlab directly?

Here's an example I posted in that thread: how to create a PDF with two paragraphs, using restructured text:

This is a paragraph. It has several lines, but what it says does not matter.
I can press enter anywhere, because
it ends only on a blank
line. Like this.

This is another paragraph.

And here's what you need to do in reportlab:

# -*- coding: utf-8 -*-
from reportlab.platypus import SimpleDocTemplate, Paragraph, Spacer
from reportlab.lib.styles import getSampleStyleSheet
from reportlab.lib.units import inch
styles = getSampleStyleSheet()
def go():
  doc = SimpleDocTemplate("phello.pdf")
  Story = [Spacer(1,2*inch)]
  style = styles["Normal"]
  p = Paragraph('''This is a paragraph. It has several lines, but what it says does not matter.
I can press enter anywhere, because
it ends when the string ends.''', style)
  p = Paragraph('''This is another paragraph.''', style)


Of course, you could write a program that takes text separated in paragraphs as its input, and creates the reportlab Paragraph elements, puts them in the Story and builds the document.... but then you are reinventing the restructured text parser, only worse!

Restructured text is data. Reportlab programs are code. Data is easier to generate than code.

So, how do you do a report?

You create a file with the data in it, process it via one of the many rst->pdf paths (I suggest my rst2pdf script, but feel free to use the other 9 alternatives).

Suppose you have the following data:

frobtimes = [[1,3],[3,5],[9,8]]

And you want to produce this report:

Frobniz performance

* 1 frobniz: 3 seconds

* 3 frobniz: 5 seconds

* 9 frobniz: 8 seconds

You could do it this way:

print '''Frobniz performance

for ft in frobtimes:
  print '* %d frobniz: %d seconds\n'%(ft[0],ft[1])
And it will work. However, this means you are writing code again! This time, you are reinventing templating

What you want is to use, say, Mako (or whatever). It's going to be better than your homebrew solution anyway. Here's the template for the report:

${title('Frobniz Performance')}

% for ft in frobtimes:
* ${ft[0]} frobniz: $ft[1] seconds

% endfor

This uses a function title defined thus:

title=lambda(text): text+'\n'+'='\*len(text)+'\n\n'

You could generalize it to support multiple heading levels:

title=lambda(text,level): text+'\n'+'=-~_#%^'[level]*len(text)+'\n\n'

Trickier: tables

One very common feature of reports is tables. In fact, it would be more natural to present our frobniz report as a table. The bad news is how tables look like in restructured text:

| Frobniz | Time (seconds) |
|        1|              3 |
|        3|              5 |
|        9|              8 |

Which is very pretty, but not exactly trivial to generate. But don't worry, there is a simple solution for this, too: CSV tables:

.. csv-table:: Frobniz time measurements
   :header: Frobniz,Time(seconds)


Produces this:

Frobniz time measurements
Frobniz Time(seconds)
1 3
3 5
9 8

And of course, there is python's csv module if you want to be fancy and avoid trouble with delimiters, escaping and so on:

def table(title,header,data):
  csv_writer = csv.writer(head, dialect='excel')

  head=´:header: %s´head.getvalue()

  csv_writer = csv.writer(body, dialect='excel')
  for row in data:

  return '''.. csv-table:: %s
     :header: %s


will produce neat, ready for use, csv table directives for restructured text.

How would it work?

This python program is really generic. All you need is for it to match a template (an external text file), with data in the form of a bunch of python variables.

But how do we get the data? Well, from a database, usually. But it can come from anywhere. You could be making a report about your del.icio.us bookmarks, or about files in a folder, this is really generic stuff.

What would I use to get the data? I would use JSON in the middle. I would make my report generator take the following arguments:

  1. A mako template name.
  2. A JSON data file.

That way, the program will be completely generic.

So, put all this together, and there's the superduper magical report generator.

Once you get rst, pass it through something to create PDFs, but store only the rst, which is (almost) plain text, searchable, easy to store, and much smaller.

I don't expect such a report generator to be over 50 lines of code, including comments.

Missing pieces

  • While restructured text is almost plain text, there are special characters, which you should escape. That is left as an exercise to the reader ;-)
  • Someone should really write this thing ;-)

Contents © 2000-2019 Roberto Alsina