2006-11-02 23:20

rst2rst works (80% or so)

What is it? A program that takes a docutils document tree ( parsed from a RST document or programatically generated) then dumps as close as I can guess to reasonable RST back.

This lets Restructured Text be a saveable data format, which is nice.

It's not done as a docutils writer. Sorry, I couldn't make that work.

What works? Most of it.

What doesn't? A dozen directives, custom interpreted text roles, and tables.

Yes, all of those are important. But the rest seems to work ok!

Look: a 804 line RST document containing almost every feature of the language, and the only difference in the generated HTML output between the original and rst2rst's is an invisible difference in continuation lines in line blocks.

[[email protected] wp]$ python rst2rst.py t1.txt > t2.txt
[[email protected] wp]$ /usr/bin/rst2html.py t1.txt t1.html ;  /usr/bin/rst2html.py t2.txt t2.html
[[email protected] wp]$ diff t1.html t2.html
468,469c468,469
< <div class="line">But I'm expecting a postal order and I can pay you back
< as soon as it comes.</div>
---
> <div class="line">But I'm expecting a postal order and I can pay you back</div>
> <div class="line">as soon as it comes.</div>
[[email protected] wp]$ wc -l t1.txt
804 t1.txt

You can get rst2rst.py and the testfile.

Anyone knows of a real docutils test suite I could borrow?

2006-10-29 15:59

Hacking Restructured Text

I am a great fan of Restructured Text. I write my blog using it. I write my business proposals using it, I write my documentation using it, I think you should write almsot everything you write now using it. I have even blogged many times about it.

RST is a minimal markup language. You can figure it out in a couple of hours, and then use it to produce pretty HTML pages, PDF docs, man pages, LaTeX documents, S5 slides, and other things.

Plus, the source works as a plain text version, and is very readable:

This is a title
===============

Some text in a paragraph

A subtitle
----------

* A list

* More items

  1. A numbered sublist

  2. Another item

     a) A sub-sub-list

     b) With more items


+-----------------------+-------------------------+
|   A table             | With two columns        |
+-----------------------+-------------------------+
|  And Two              |   rows                  |
+-----------------------+-------------------------+

See? Nice.

RST has another great thing that is not so well known: there is a parser for it, which turns the document into a tree of nodes reppresenting different parts of the document.

You can manipulate this node tree, modifying the document, and then generate the output.

But there is no way, right now, to generate RST from the tree. Which means it's a one way road.

Well, I am hacking to fix that.

Right now, I handle titles, sections, all sorts of lists, transitions, quotes, emphasis, italics, and a few other elements.

The only ones that seem difficult to implement are tables, but I still think I can do it. Although the produced RST doesn't look the same as the original, it is functionally identical.

How do I test if it works? With a test suite. If it works, it should be invariant this way:

RSTsample -> rst2html produces the exact same output as RSTsample -> rst2rst -> rst2html

If anyone wants a copy, email me.

2006-10-25 16:01

Some people say anything

Last night I saw an "investigative news" program on the TV. It's called "Informe Central", and their headline story was about an abandoned factory in San telmo (where tourists go to see typical BA and locals go to see tourists).

The thing is, that factory has been taken over by poor people who live there. It's conveniently located, and they don't pay anything.

On the other hand, it's a nest of drug, rape, poverty and violence, but that's not the only thing these "journalists" said.

They said they lived in inhumane conditions, up to 2.6 persons per square meter.

They also said about 300 people live there, which would mean there are roughly 115 square meters in the factory.

Which is, actually, closer to 1200 square meters. or maybe 5000. But they kept on saying those numbers.

Do you know that in order to have 2.6 persons per square meter so that each of them has a small (double) bunk bed, you would have to put the bunk beds one next to the other with 20cm-wide spaces in between?

How the hell did they get that number? Is that a sign of their regular investigative quality? Probably.

2006-10-24 09:37

An application idea

Yesterday I wrote that I have too many ideas. Ok, here's another one:

A word processor for writers. And when I say writers, I mean novelists, technical book writers, script writers, playwrights...

Word is not very good for a writer. OpenOffice is not good. KWord is probably worse (because of the emphasis on page layout). LyX is probably as good as it gets, and it's not exactly perfect.

A writer actually needs a simple-ish word processor with a bunch of ancillary gadgetry.

For example:

  • Statistics:
    • How many words/chars/pages a day is he writing
    • A live word/char counter
    • A live word frequency monitor (put the cursor on a word and see how often it's used)
    • Live counter of document/chapter/section/scene size.
  • Outlining
    • Real live outlining. The kind where you drag stuff around and the text follows.
    • An editable full-text outline view
  • Collaboration
    • Multiple editors
    • Versioning control
  • Projects
    • Multiple files per project
    • Linking files to places on the text in other files
  • Index cards
    • Associating index cards to places on the text
    • Grouping index cards (for example, per character, or per location)
    • Placing them on a timeline or a storyboard
  • Live Thesaurus / Dictionary
    • Show definitions and alternatives as the pointer crosses a word.
    • One click replacement
  • Styling
    • Per fragment/paragraph styles
    • User defined
    • Predefined styles

There are a bazillion things he does not need, though, like detailed page layouting, or grammar checking.

It would be nice if it could later be easily imported (styled!) into something like Scribus so a decent page layout could be done, but it doesn't need to be in the same app at all.

The text engines in Qt4 are good enough for all this app needs graphically.

RestructuredText is good enough to provide a backend, a parser, an exporter, a reader, a transformer, whatever.

So there it is, another idea I will most likely not implement. Someone please run with it, you can probably make it a rather expensive GPL shareware on Mac ;-)

2006-10-23 17:54

Wifi dongle

Bought an Eusso (No, I had never heard of them either) Wifi USB dongle.

Why?

  • It says "linux driver" on the blister
  • It's the cheapest 802.11g thing on the local ebay-like place
  • My ancient pcmcia 802.11b card sucks.

I am thinking of buying half a dozen more and getting rid of all the cables for all my boxes, all of Rosario's office and the guest computers (yes, I do have guest computers. They are there so my guests have their own computers :-).

Plugged it and it worked (ok, I had to install the zd1211 driver which took me 40 seconds). Only problem: it's hot. HOT.

So, need a USB/WiFi thingie that works well in Linux? You can do worse than this baby.

2006-10-23 16:26

So much cool stuff, so little time.

I read Zack Rusin's blog about benchmarking vector graphic APIs... then I see a comment mentioning Antigrain. Then I check the antigrain examples, and they are gorgeous, and pretty fast! Even on a lame Sis630!

Then it hit me... I am never going to do anyhting with it (or with Qt's Arthur). Maybe I am getting old, but I see a swirl of cool software... dparser... asymptote... txt2tags ... (and those are only the ones I saw in the last week).

All of them are about something that interests me, but I simply can't do anything. I mean, would it be cool to write a vector-app-for-kids with antigrain (or Arthur?) Sure!

Would I like to implement this shell-style language I have floating in my head for a year using dparser (or pyparsing?) Yeah! Would I like to hack a Trac plugin using txt2tags (or restructured text?) Sure!

But when can I do that? I have my business, my wife, her pregnancy, my other projects... maybe that's what happens when you become old. You gather enough baggage that you can't lift any more backpacks in your trek.

But what can I do with all the ideas swirling in my head? Really! What?

2006-10-19 14:23

Moving load around with netpipes.

I had an emergency. The CPU usage of a certain mail server was raising, and the culprit was clamd.

For some reason, in the last few months, the CPU usage of clamd kept rising, and was now near 70% average of the server's CPU.

Removing the antivirus is, of course, not an option. On the other hand, performance was starting to suffer.

The usual response would be a full retooling of the setup, multiple SMTP servers handling the load against a central storage server, clamav running on each SMTP... but switching to that involves a full reimplementation of the system. Because of the antivirus??? Hell no.

So, I started investigating how I could move clamd to another box, like I did with spamassassin. It was not pretty.

  • clamav has a protocol defined for connecting to remote servers.
  • clamav doesn't have a client for it.
  • clamd-stream-client doesn't seem to work.

So, I thought... let's be original. What do I actually need?

I need to be able to call clamdscan, and have it scan the current folder. Based on its exit status code (0/1/2) the mail is accepted, rejected, temporarily rejected.

Having the same folders structure available to two boxes is trivial. I have NFS, lots of bandwidth and another computer.

Running clamdscan in the second box, scanning those folders is trivial too.

The missing piece is a way to tell the second box's clamd to scan, and get the exit code in the mail server.

Enter netpipes!

Netpipes is software to "make TCP sockets usable from the shell". You can find it at http://web.purplefrog.com/~thoth/netpipes/netpipes.html.

And here's a replacement clamdscan which works the way I wanted it:

#!/bin/dash
exit `echo \$PWD | hose 192.168.1.53 9000 --slave `

This version takes the folder you want to scan as an argument:

#!/bin/dash
exit `echo \$* | hose 192.168.1.53 9000 --slave `

And here is the "server side". First netclam.sh:

#!/bin/dash -x
read args
/usr/bin/clamdscan \$args >/dev/null 2>&1
echo \$?

Then the "network code":

faucet 9000  --in --out /usr/bin/netclam.sh

And there you have it. ClamAV moved to another server. With 5 lines of shell code.

2006-10-17 21:06

No, I don't get a dime from them

For a few months I have been using an unmanaged virtual private server from Tektonic, and I love it.

What's that? Let's take it one word at a time, and then some more.

  1. It's a server: which means it's a full-ish linux installation. So it is capable of doing lots of things. I can run all sorts of weird python thingies in it if I want. IMAPS and SSMTP? No problemo.
  2. It's private: which means I am root on it. I have the shell. I choose what to install.
  3. It's virtual: it's a Virtuozzo partition in a real server. That means no custom kernel modules, and that since almost everything is shared with other instances, 5GB of disk and 128MB of RAM go a long way.
  4. It's unmanaged: which means I manage it. Which is just the way I prefer it, since that's my job.
  5. It's cheap. I started on a 8 dollars a month plan (which doesn't seem to be there anymore, the current cheapest is a 15 dollars plan).
  6. It's a throwaway. I want to host some client as a favour? I just put it there. I could even rent another of these servers for a while, use it, then close it. Backups? Clicking on a webpage saves the image! Other than that... I back it.
  7. Fixed IPs. All you want (for extra coins).
  8. A home away from home. All my stuff is there. I need it, I get it. Without bothering about having my own server at home via no-ip or somesuch (which of course I still have too ;-)
  9. It works. It hardly ever breaks. And having survived expensive, managed servers, this baby is working just as well.
  10. It's a nice gift. Suppose you have a connection to a free software project/LUG/family/whatever, and they need a place on the internet. Why not sponsor them with something like this? I offered one to PyAr (which didn't take it, but it's the thought that counts ;-)
  11. The ultimate learning experience: you can restore the system in 2 minutes. Want to play/learn sysadmining? Do it on the real virtual thing! Much cheaper than hosing your own box ;-)
  12. They offer a good service. So, people should know about it. And of course... if you know a similar, but even better deal... I'm all ears!

2006-10-05 13:38

A different UNIX Part II: A better shell language

One of the things people study when they "learn unix" is shell scripting and usage. Because every system has a shell, and if you learn to use it interactively, you are half way there to automating system tasks!

Let's consider that for a moment... what are the odds that the same language can be good for interactive use and for programming? I say slim.

Not to mention that learning shell as a way to learn unix is like going to a school that teaches TV production, and studying the remote. While useful, not really the important tool (ok, that analogy doesn't work at all. But it sounds neat, doesn't it?).

The first thing is that today's Linux domination of the unixsphere has caused a serious monoculture in shell scripting: everyone uses bash. The more enlightened ones may check that their scripts work on some other Bourne-style shell.

There are no important distributions (or proprietary unixes) that use a csh or anything like it. Debian has a policy that things should work without bashisms. That's about as good as it gets.

Writing a dozen pages on how shell sucks would be trivial. But uninteresting.

So, let's think it over, and start from the top.

What should a shell scripting language be like?

What doesn't matter?

Let's tackle these things. I invite anyone to add extra ideas in the comments section.

What should a shell scripting language be like?

  • Interpreted (obvious)

  • Dynamic typing (you will be switching ints to strs and viceversa all the time).

  • Easy incorporation of other programs as functions/methods/whatever.

    That pretty much is what makes it a shell. ls should be indistinguishable from something written using the shell itself.

  • Pipes. This is a must. Unix has a bazillion tools meant to be used in command pipelines. You can implement a RDBMS using that kind of thing (look for nosql). Leverage that.

    But even here, on its strength, the shell is not perfect. Why can't I easily pipe stderr and stdout to different processes? Why can't I pipe the same thing to two processes at the same time (yes, I know how to do it with a neat trick ;-)

  • Globbing. *.txt should give you a list of files. This is one of the obvious things where sh is broken. *.txt may be a string or a list, depending on context... and a list is just a series of strings with blanks. That is one of the bazillion things that makes writing shell scripts (at least good ones) hard:

    [[email protected] ralsina]\$ echo *out
    a.out
    [[email protected] ralsina]\$ echo *outa
    *outa
    
  • A list data type. No, writing strings separated with spaces is not ok. Maybe a python-style dictionary as well?

  • Functions (obvious)

  • Libraries (and ok, the shell source mechanism seems good enough)

  • Standalone. It shouldn't spawn sh for any reason ;-)

What doesn't matter?

  • Performance. Ok, it matters that a five-liner doesn't take 50 minutes unless it has to. But 1 seconds or two seconds? not that important.
  • Object orientation. I don't see it being too useful. Shell scripts are old-fashioned :-)
  • Compatibility to current shells. Come on. Why be like something that sucks? ;-)

Now, the example

Let's consider a typical piece of shell script and a rewrite in a more reasonable syntax.

This is bash (no it doesn't work on any other shell, I think):

DAEMONS=( syslog network cron )

# Start daemons
for daemon in "\${DAEMONS[@]}"; do
      if [ "\$daemon" = "\${daemon#!}" ]; then
              if [ "\$daemon" = "\${daemon#@}" ]; then
                      /etc/rc.d/\$daemon start
              else
                      stat_bkgd "Starting \${daemon:1}"
                      (/etc/rc.d/\${daemon:1} start) &>/dev/null &
              fi
      fi
done

And since DAEMONS is something the admin writes, this script lets you shoot in the foot in half a dozen ways, too.

How about this:

DAEMONS=["syslog","network","cron"]

# Start daemons
for daemon in DAEMONS {
      if ( daemon[0] != "!" ) {
              if ( daemon[0] == "@" ) {
                      stat_bkgd ("Starting "+daemon[1:])
                      /etc/rc.d/+daemon[1:] ("start") &> /dev/null &
              } else {
                      /etc/rc.d/+daemon ("start")
              }
      }
}

Of couse the syntax is something I just made up as I was writing, but isn't it nicer already?

2006-10-04 20:15

Try htop

Ever needed a process monitor that runs in a terminal? Have you been using top? Use Htop instead. Much, much, much nicer!

Contents © 2000-2019 Roberto Alsina