Scraping doesn't hurt

2012-02-17 20:34

I am in general allergic to HTML, specially when it comes to parsing it. However, every now and then something comes up and it's fun to keep the muscles stretched.

So, consider the Ted Talks site. They have a really nice table with information about their talks, just in case you want to do something with them.

But how do you get that information? By scraping it. And what's an easy way to do it? By using Python and BeautifulSoup:

from BeautifulSoup import BeautifulSoup
import urllib

# Read the whole page.
data = urllib.urlopen('http://www.ted.com/talks/quick-list').read()
# Parse it
soup = BeautifulSoup(data)

# Find the table with the data
table = soup.findAll('table', attrs= {"class": "downloads notranslate"})[0]
# Get the rows, skip the first one
rows = table.findAll('tr')[1:]

items = []
# For each row, get the data
# And store it somewhere
for row in rows:
    cells = row.findAll('td')
    item = {}
    item['date'] = cells[0].text
    item['event'] = cells[1].text
    item['title'] = cells[2].text
    item['duration'] = cells[3].text
    item['links'] = [a['href'] for a in cells[4].findAll('a')]
    items.append(item)

And that's it! Surprisingly pain-free!

To write, and to write what.

2012-02-17 03:18

Some of you may know I have written about 30% of a book, called "Python No Muerde", available at http://nomuerde.netmanagers.com.ar (in spanish only).That book has stagnated for a long time.

On the other hand, I wrote a very popular series of posts, called PyQt by Example, which has (you guessed it) stagnated for a long time.

The main problem with the book was that I tried to cover way too much ground. When complete, it would be a 500 page book, and that would involve writing half a dozen example apps, some of them in areas I am no expert.

The main problem with the post series is that the example is lame (a TODO app!) and expanding it is boring.

¡So, what better way to fix both things at once, than to merge them!

I will leave Python No Muerde as it is, and will do a new book, called PyQt No Muerde. It will keep the tone and language of Python No Muerde, and will even share some chapters, but will focus on developing a PyQt app or two, instead of the much more ambitious goals of Python No Muerde. It will be about 200 pages.

I have acquired permission from my superiors (my wife) to work on this project a couple of hours a day, in the early morning. So, it may move forward, or it may not. This is, as usual, an experiment, not a promise.

Antisocial Networks

2012-02-16 01:47

I love http://goodreads.com very much. It has measurably improved my life as a reader. I have read authors I wouldn't have read without it, books from those authors I would have ignored, and keeps track of what I read, am reading and will read.

What it has never been for me, is a social network. I would be about as happy with it if I knew noone else on the site, if it were just me and a bazillion strangers whose taste I can leech off.

Sure, I have a few friends there nowadays, but I hardly ever do anything "social" beyond accepting requests and posting reviews which I have no idea if someone reads.

I love Flickr where I put most of my pictures (soon: all of my pictures). It's cheap and I can upload an almost infinite amount of pics there, and I can share them with friends and family sometimes (by reposting them to facebook).

They were even kind enough to store the pictures I uploaded as a free user until I paid for the space to store them 5 years later.

I love Twitter because it's a place to post short things that don't deserve a blog post, to chatter with friends and not-so-friends, to know more people, and to waste some time every day.

One of those things is not like the others. One of those things I use for its social features, the others I use for other reasons, and I don't really care about them being social or not.

I think nowadays, for a social network to succeed, it has to cater to the antisocial, at least at first, when you know noone there. I don't go to Flickr to debate. I don't go to Goodreads to chat. I go there to put pictures and keep my books straight. And that's what kept me there long enough to meet people.

The blogs I don't have

2012-02-15 20:36

Things you only like or believe because your mom said so.
Tips for Time Travelers.
Cute plants and their antics.
1001 ways to peal a cat.
Things morticians say.
Traveling for Time Tippers.
Coins of the world: what do they taste like?
Things found in people's noses.
Surprise, that is not chicken!
Time for Tip Travelers.
World of Lint.

Kill the Dead (Sandman Slim, #2)

2012-02-14 00:00

Author: Richard Kadrey
Rating: ★ ★ ★
See in goodreads

Review:

This is not a book. It's the second half of "Sandman Slim". Read that and then decide if you actually want to read this second half or not.

Me, I liked it, thank you very much.

Ralsina.Me — Roberto Alsina's website

Review: