Skip to main content

Ralsina.Me — Roberto Alsina's website

Smiljan, a Small Planet Generator

I main­tain a cou­ple of small "plan­et" sites. If you are not fa­mil­iar with plan­et­s, they are sites that ag­gre­gate RSS/Atom feeds for a group of peo­ple re­lat­ed some­how. It makes for a nice, sin­gle, the­mat­ic feed.

Re­cent­ly, when chang­ing them from one serv­er to an­oth­er, ev­ery­thing broke. Old posts were new, feeds that had not been up­dat­ed in 2 years were al­ways with all its posts on top... a dis­as­ter.

I could have gone to the old server, and start­ed de­bug­ging why raw­dog was do­ing that, or switch to plan­et, or look for oth­er soft­ware, or use an on­line ag­gre­ga­tor.

In­stead, I start­ed think­ing... I had writ­ten a few RSS ag­gre­ga­tors in the past... Feed­pars­er is again un­der ac­tive de­vel­op­men­t... raw­dog and plan­et seem to be pret­ty much aban­doned... how hard could it be to im­ple­ment the min­i­mal plan­et soft­ware?

Well, not all that hard, that's how hard it was. Like it took me 4 hours, and was not even dif­fi­cult.

One rea­son why this was eas­i­er than what plan­et and raw­dog achieved is that I am not do­ing a stat­ic site gen­er­a­tor, be­cause I al­ready have one so all I need this pro­gram (I called it Smil­jan) to do is:

  • Parse a list of feeds and store it in a data­base if need­ed.

  • Down­load those feeds (re­spec­t­ing etag and mod­­i­­fied-s­ince).

  • Parse those feeds look­ing for en­tries (feed­­pars­er does that).

  • Load those en­tries (or rather, a tiny sub­­set of their data) in the data­base.

  • Use the en­tries to gen­er­ate a set of files to feed Niko­la

  • Use niko­la to gen­er­ate and de­­ploy the site.

So, here is the fi­nal re­sult: http://­plan­e­ta.python.org.ar which still needs them­ing and a lot of oth­er stuff, but work­s.

I im­ple­ment­ed Smil­jan as 3 doit tasks, which makes it very easy to in­te­grate with Niko­la (if you know Niko­la: add "from smil­jan im­port *" in your do­do.py and a feeds file with the feed list in raw­dog for­mat) and voilá, run­ning this up­dates the plan­et:

doit load_feeds update_feeds generate_posts render_site deploy

Here is the code for smil­jan.py, cur­rent­ly at the "gross hack that kin­da work­s" stage. En­joy!

# -*- coding: utf-8 -*-
import codecs
import datetime
import glob
import os
import sys

from doit.tools import timeout
import feedparser
import peewee


class Feed(peewee.Model):
    name = peewee.CharField()
    url = peewee.CharField(max_length = 200)
    last_status = peewee.CharField()
    etag = peewee.CharField(max_length = 200)
    last_modified = peewee.DateTimeField()

class Entry(peewee.Model):
    date = peewee.DateTimeField()
    feed = peewee.ForeignKeyField(Feed)
    content = peewee.TextField(max_length = 20000)
    link = peewee.CharField(max_length = 200)
    title = peewee.CharField(max_length = 200)
    guid = peewee.CharField(max_length = 200)

Feed.create_table(fail_silently=True)
Entry.create_table(fail_silently=True)

def task_load_feeds():
    feeds = []
    feed = name = None
    for line in open('feeds'):
        line = line.strip()
        if line.startswith('feed'):
            feed = line.split(' ')[2]
        if line.startswith('define_name'):
            name = ' '.join(line.split(' ')[1:])
        if feed and name:
            feeds.append([feed, name])
            feed = name = None

    def add_feed(name, url):
        f = Feed.create(
            name=name,
            url=url,
            etag='caca',
            last_modified=datetime.datetime(1970,1,1),
            )
        f.save()

    def update_feed_url(feed, url):
        feed.url = url
        feed.save()

    for feed, name in feeds:
        f = Feed.select().where(name=name)
        if not list(f):
            yield {
                'name': name,
                'actions': ((add_feed,(name, feed)),),
                'file_dep': ['feeds'],
                }
        elif list(f)[0].url != feed:
            yield {
                'name': 'updating:'+name,
                'actions': ((update_feed_url,(list(f)[0], feed)),),
                }


def task_update_feeds():
    def update_feed(feed):
        modified = feed.last_modified.timetuple()
        etag = feed.etag
        parsed = feedparser.parse(feed.url,
            etag=etag,
            modified=modified
        )
        try:
            feed.last_status = str(parsed.status)
        except:  # Probably a timeout
            # TODO: log failure
            return
        if parsed.feed.get('title'):
            print parsed.feed.title
        else:
            print feed.url
        feed.etag = parsed.get('etag', 'caca')
        modified = tuple(parsed.get('date_parsed', (1970,1,1)))[:6]
        print "==========>", modified
        modified = datetime.datetime(*modified)
        feed.last_modified = modified
        feed.save()
        # No point in adding items from missinfg feeds
        if parsed.status > 400:
            # TODO log failure
            return
        for entry_data in parsed.entries:
            print "========================================="
            date = entry_data.get('updated_parsed', None)
            if date is None:
                date = entry_data.get('published_parsed', None)
            if date is None:
                print "Can't parse date from:"
                print entry_data
                return False
            date = datetime.datetime(*(date[:6]))
            title = "%s: %s" %(feed.name, entry_data.get('title', 'Sin título'))
            content = entry_data.get('description',
                    entry_data.get('summary', 'Sin contenido'))
            guid = entry_data.get('guid', entry_data.link)
            link = entry_data.link
            print repr([date, title])
            entry = Entry.get_or_create(
                date = date,
                title = title,
                content = content,
                guid=guid,
                feed=feed,
                link=link,
            )
            entry.save()
    for feed in Feed.select():
        yield {
            'name': feed.name.encode('utf8'),
            'actions': [(update_feed,(feed,))],
            'uptodate': [timeout(datetime.timedelta(minutes=20))],
            }

def task_generate_posts():

    def generate_post(entry):
        meta_path = os.path.join('posts',str(entry.id)+'.meta')
        post_path = os.path.join('posts',str(entry.id)+'.txt')
        with codecs.open(meta_path, 'wb+', 'utf8') as fd:
            fd.write(u'%s\n' % entry.title.replace('\n', ' '))
            fd.write(u'%s\n' % entry.id)
            fd.write(u'%s\n' % entry.date.strftime('%Y/%m/%d %H:%M'))
            fd.write(u'\n')
            fd.write(u'%s\n' % entry.link)
        with codecs.open(post_path, 'wb+', 'utf8') as fd:
            fd.write(u'.. raw:: html\n\n')
            content = entry.content
            if not content:
                content = u'Sin contenido'
            for line in content.splitlines():
                fd.write(u'    %s\n' % line)

    for entry in Entry.select().order_by(('date', 'desc')):
        yield {
            'name': entry.id,
            'actions': [(generate_post, (entry,))],
            }

Nikola 2.1.1 + GitHub

By pop­u­lar re­quest, Niko­la now has its source code at GitHub.

Al­so, if you tried ver­sion 2.1 and it failed, try 2.1.1, be­cause I for­got to add a cou­ple of files in one of the themes in 2.1.

Alimento para la Culpa

Span­ish-on­ly post, sor­ry!


Hace co­mo dos años es­cribí un peda­zo de un li­bro. De vez en cuan­do lo miro con car­iño y pien­so que es­taría bueno ter­mi­narlo, o re­definir­lo y cer­rar­lo, y cosas así. Has­ta hice un plan que nun­ca pude pon­er en prác­ti­ca porque la vi­da te ll­e­va a hac­er cosas dis­tin­tas.

Re­sul­ta que es­tán us­an­do mi hu­milde ca­cho de li­bro co­mo ma­te­ri­al de es­tu­dio en la ma­te­ria IWI-131 Pro­gra­mación de com­puta­dores - 1er Semestre 2012 de la Uni­ver­si­dad Téc­ni­ca Fed­eri­co San­ta María en Chile.

Por un lado, me pone con­tento. Por otro la­do me pone nervioso que un li­bro que ci­ta es­to sea ma­te­ri­al de es­tu­dio para chicos de 18:

Has­ta que cumple vein­ticin­co, to­do hom­bre pien­sa ca­da tan­to que dadas las cir­cun­stan­cias cor­rec­tas po­dría ser el más jo­di­do del mun­do. Si me mu­dara a un monas­te­rio de artes mar­ciales en Chi­na y es­tu­di­ara duro por diez años. Si mi fa­mil­ia fuera masacra­da por traf­i­cantes colom­bianos y ju­rara ven­gan­za. Si tu­viera una en­fer­medad fa­tal, me quedara un año de vi­da y lo dedicara a acabar con el crimen. Si tan só­lo aban­donara to­do y dedicara mi vi­da a ser jo­di­do.

—Neal Stephen­son (S­now Crash)

Por otro la­do más, me da ganas de ter­mi­narlo. Por el úl­ti­mo la­do (con lo cual lo que ten­go es una sen­sación cuad­ran­gu­lar), pien­so: que raro sería que te den un li­bro en la facu que es­tá to­do in­com­ple­to. Es co­mo hac­erse fan de The Event y nun­ca en­ter­arse que era el famoso Even­t, co­mo haber vis­to Twin Peaks y nun­ca sacarse de enci­ma la du­das de qué era to­da esa bizarrea­da. Co­mo seguir to­davía es­peran­do que Mel Brooks ha­ga la se­gun­da parte de la his­to­ria del mun­do para poder en­ten­der "Jews in Space".

Aho­ra po­dría no ter­mi­narlo co­mo de­cisión artís­ti­ca!

Y por supuesto:


Contents © 2000-2024 Roberto Alsina