Skip to main content

Ralsina.Me — Roberto Alsina's website

Dear readers, a question!

rst2rst was go­ing along just fine, but I have run in­to the re­al prob­lem: TA­BLES.

To re­fresh your meme­o­ry (if you know) or let you know any­way, RST is an ascii markup lan­guage. And it sup­ports ta­bles. Like this ex­am­ple:

+----------+----------------------+
|          |                      |
+----------+------------+---------+
|          |            |         |
+----------+------------+         +
|                       |         |
+-----------------------+---------+

The pars­er gives me a tree struc­ture:

row (en­try, en­try with morecol­s=1) row (en­try,en­try,en­try with morerows=1) row (en­try with morecol­s=1)

Now, each cell has a min­i­mum size set by its con­tents.

For ex­am­ple, a cell con­tain­ing "Hel­lo" has a min­i­mum size of 5x1 (wx­h).

It is a good idea to sur­round the con­tents with a blank row/­colum­n, so make that 7x3. I can fig­ure this out from the con­tents.

And here's the trick ques­tion... any­one knows an al­go­rithm to do this that is not in­cred­i­bly dif­fi­cult? Or, if it is in­cred­i­bly dif­fi­cult, care to help me? ;-)

taj / 2006-11-03 21:56:

What would the output of the desired function look like?

Roberto Alsina / 2006-11-03 23:27:

An asci art table again, hopefully
:-)

Anonymous / 2006-11-04 00:32:

Can't you just use max_row_height as row height and max_col_width as column width?

Anonymous / 2006-11-04 00:34:

Hmm.. after a bit of thinking I see that you can't. In which case max_row_height for cells which don't have morerows = 1 and similarly for columns.

Think that works, no?

Roberto Alsina / 2006-11-04 01:38:

Not exactly, because you also need that the sum of the two rows be larger than the morerows=1 cell

Jonas Widarsson / 2006-11-04 10:09:

I figure you are rendering RST markup from a tree representation, correct? You said so I think some days ago, but it doesn't say so in this article.

Anyway, My first guess would be to loop the entire tree to record all the quirks first, and then when the data is known to the table renderer, you can loop over it again and print it out. Let's call it pass 1 and pass 2.

Pass 1:
Loop the tree and let every cell report to the row and column it is in how much space it needs. Then calculate a max(height) on all the cells in a row and a max(width) on all the cells in a column.

Pass 2:
Loop again to just print it out according to the rows and cols dimensions you have found out.

You will have to find out how to make Pass 1 discover what to do with the multiple row/col spanning cells. maybe just distributing it's requirement on several virtual cells before the max() calculations in Pass 1.

Then of course some genius might think only one pass is necessary. Or even say I am an idiot not thinking of this and that. ;)

I like these kinds of problems. Email me if you want to discuss it further.

Cheers!

Roberto Alsina / 2006-11-04 12:08:

Jonas, I am leaning in that direction. In fact, I have collected the data, and generated a matrix of it.

I believe I can make things work for multicol/row things by averaging (or putting all the size on the first column/row, but that gives ugly (working) layout.

Jonas Widarsson / 2006-11-04 13:16:

If you have one multicell spanning several rows/cols, I suggest you first see what requirements the rows cols it spans have.

Let's say you span a cell (A) over two rows where one (rA) wants to be twice as large as the other (rB). Then let's say that A just a little bit higher than rA and rB are together. Then it would produce an unnecessarily large request of space in rA if you simply divide A into two parts. You should distribute the height of A to rA and rB respecting their original requests for space percentually. Hope you understand. Then you would only get minimal space waste, I think (at this point ;) )

One thing is unclear to me, and that is how you will want to produce the cell content. If it is much data in it, will you wrap the lines? if so, at which dimensional point? It might require you to wait with that decision until all the other cells are calculated.

These cases all raise a problem about having more than one multicell in the same row/col. You would have several unknown variables wanting to rule the result.

Maybe one could iterate the dimension decision process until values stagnate. and only calculate simple cells on the first iteration. But that's probably an expensive solution. Someone better than me at math would suggest something else.

Pablo / 2006-11-04 13:29:

Hi Roberto,

I found an interesting solution. Rows and columns can be treated almost independently in this algorithm. The idea starts like people commented before: for each column, the minimum width is calculated by taking the maximum width among the cells in that column which don't span any other columns. Do the same for rows.

At this point, you have minimum widths m_1, ..., m_n for each column. You now have to expand some of the columns to make sure multi-column cells fit in the table. Let's say you expand columns from 1 to n by e_1 to e_n.
You could get a simple solution from here by simply applying an heuristic, but what we're trying to do is minimize the total size of the table. Since m_1+...+m_n is fixed, this is the same as saying we want to minimize the sum of the e_i.

Every multi-column cell is a restriction upon this e_i. For example, a cell spanning columns 1 and 2, whose minimum width is 10 implies the restriction: m_1+e_1+m_2+e_2 >= 10. Since m_1 and m_2 are already known at this point, you can subtract them from 10 and obtain a new restriction e_1 + e_2 >= z (discard it if z

Pablo / 2006-11-04 13:32:

(My last post got clipped because of a < sign in one of the equations. Here it goes again.)

Hi Roberto,

I found an interesting solution. Rows and columns can be treated almost independently in this algorithm. The idea starts like people commented before: for each column, the minimum width is calculated by taking the maximum width among the cells in that column which don't span any other columns. Do the same for rows.

At this point, you have minimum widths m_1, ..., m_n for each column. You now have to expand some of the columns to make sure multi-column cells fit in the table. Let's say you expand columns from 1 to n by e_1 to e_n.
You could get a simple solution from here by simply applying an heuristic, but what we're trying to do is minimize the total size of the table. Since m_1+...+m_n is fixed, this is the same as saying we want to minimize the sum of the e_i.

Every multi-column cell is a restriction upon this e_i. For example, a cell spanning columns 1 and 2, whose minimum width is 10 implies the restriction: m_1+e_1+m_2+e_2 >= 10. Since m_1 and m_2 are already known at this point, you can subtract them from 10 and obtain a new restriction e_1 + e_2 >= z (discard it if z <= 0).

This is a particular instance of a linear programming problem, which has many known algorithms. Since we're already bordering on overkill, I would use one of the simplest ones, like the Simplex Algorithm.

That's it. Do the same for rows and you're done. If you give me some time, I can sketch out some code in Python so you don't have to use any external libraries to solve the minimization part.

By the way, thanks for blogging about RST, it's been an interesting discovery. I've been coding an HTML converter in Python for my own simplified version of RST (though I probably wouldn't use it for any serious projects).

Saludos desde Córdoba.

Roberto Alsina / 2006-11-04 14:00:

Pablo: I remember implementing Simplex in Basic 16 years ago, but I have completely forgotten it.

So yeah, a linear programming solution looks like a winner.

Now, if you can hack up some code, that would be coolest :-)

Roberto Alsina / 2006-11-04 14:25:

I think I can try it using this:

http://mail.python.org/pipe...

The syntax is nice and simple enough.

Roberto Alsina / 2006-11-04 15:17:

Newsflash: of course this is a MIP (integer values needed :-) so it's a much harder problem.

lp_solve seems to handle those and has a python module.

Pablo / 2006-11-04 20:16:

Oh, I missed that bit. I guess you have to use an external lib, then.

However, if you still want to use the previous algorithm, you can obtain a valid and pretty-close-to-optimal solution by rounding up the results for each row and column after running the Simplex.