Migrating from Haloscan to Disqus (if you can comment on it, it worked ;-)

Introduction

If you are a Haloscan user, and are starting to wonder what can you do... this page will explain you a way to take your comments to Disqus, another free comment service.

A few days ago, Haloscan announced they were stopping their free comment service for blogs. Guess what service has in it the comments of the last 9 years of this blog? Yes, Haloscan.

They offered a simple migration to their Echo platform, which you have to pay for. While Echo looks like a perfectly nice comment platform, I am not going to spend any money on this blog if I can help it, since it already eats a lot of my time.

Luckily, the guys at Haloscan allow exporting the comments (that used to be only for their premium accounts), so thanks Haloscan, it has been nice!

So, I started researching where I could run to. There seems to be two large free comment systems:

Keep in mind that my main interest lays in not losing almost ten years of comments, not on how great the service is. That being said, they both seem to offer roughly the same features.

Let's consider how you can import comments to each service:

  • Disqus: It can import from blogger and some other hosted blog service. Not from Haloscan.
  • Intense Debate: Can import from some hosted services, and from some files. Not from the file Haloscan gave me.

So, what is a guy to do? Write a python program, of course! Here's where Disqus won: they have a public API for posting comments.

So, all I have to do then is:

  1. Grok the Disqus API
  2. Grok the Haloscan comments file (it's XML)
  3. Create the necessary threads and whatever in Disqus
  4. Post the comments from Haloscan to Disqus
  5. Hack the blog so the links to Haloscan now work for Disqus

Piece of cake. It only took me half a day, which at my current rates is what 3 years of Echo would have costed me, but where's the fun in paying?

So, let's go step by step.

1. Grok the Disqus API

Luckily, there is a reasonable Disqus Python Client library and docs for the API so, this was not hard.

Just get the library and install it:

hg clone https://[email protected]/IanLewis/disqus-python-client/
cd disqus-python-client
python setup.py install

The API usage we need is really simple, so study the API docs for 15 minutes if you want. I got almost all the tips I needed from this pybloxsom import script

Basically:

  1. Get your API Key
  2. You login
  3. You get the right "forum" (you can use a disqus account for more than one blog)
  4. Post to the right thread

2. Grok the Haloscan comments file

Not only is it XML, it's pretty simple XML!

Here's a taste:

<?xml version="1.0" encoding="iso-8859-1" ?>
<comments>
    <thread id="BB546">
      <comment>
        <datetime>2007-04-07T10:21:54-05:00</datetime>
        <name>superstoned</name>
        <email>[email protected]</email>
        <uri></uri>
        <ip>86.92.111.236</ip>
        <text><![CDATA[that is one hell of a cool website ;-)]]></text>
      </comment>
      <comment>
        <datetime>2007-04-07T16:14:53-05:00</datetime>
        <name>Remi Villatel</name>
        <email>[email protected]</email>
        <uri></uri>
        <ip>77.216.206.65</ip>
        <text><![CDATA[Thank you for these rare minutes of sweetness in this rough world...]]></text>
      </comment>
    </thread>
</comments>

So, a comments tag that contains one or more thread tags, which contain one or more comment tags. Piece of cake to traverse using ElementTree!

There is an obvious match between comments and threads in Haloscan and Disqus. Good.

3. Create the necessary threads and whatever in Disqus

This is the tricky part, really, because it requires some things from your blog.

  • You must have a permalink for each post
  • Each permalink should be a separate page. You can't have permalinks with # in the URL
  • You need to know what haloscan id you used for each post's comments, and what the permalink for each post is.

For example, suppose you have a post at http://ralsina.me/weblog/posts/ADV0.html and it has a Haloscan comments link like this:

<a href="javascript:HaloScan('ADV0');" target="_self"> <script type="text/javascript">postCount('ADV0');</script></a>

You know where else that 'ADV0' appears? In Haloscan's XML file, of course! It's the "id" attribute of a thread.

Also, the title of this post is "Advogato post for 2000-01-17 17:19:57" (hey, it's my blog ;-)

Got that?

Then we want to create a thread in Disqus with that exact same data:

  • URL
  • Thread ID
  • Title

The bad news is... you need to gather this information for your entire blog and store it somewhere. If you are lucky, you may be able to get it from a database, as I did. If not... well, it's going to be a lot of work :-(

For the purpose of this explanation, I will assume you got that data nicely in a dictionary indexed by thread id:

{
  id1: (url, title),
  id2: (url, title)
}

4. Post the comments from Haloscan to Disqus

Here's the code. It's not really tested, because I had to do several attempts and fixes, but it should be close to ok (download).

#!/usr/bin/python
# -*- coding: utf-8 -*-

# Read all comments from a CAIF file, the XML haloscan exports

from disqus import DisqusService
from xml.etree import ElementTree
from datetime import datetime
import time


# Obviously these should be YOUR comment threads ;-)
threads={
    'ADV0': ('http://ralsina.me/weblog/posts/ADV0.html','My first post'),
    'ADV1': ('http://ralsina.me/weblog/posts/ADV1.html','My second post'),
    }

key='USE YOUR API KEY HERE'
ds=DisqusService()
ds.login(key)
forum=ds.get_forum_list()[0]

def importThread(node):
    t_id=node.attrib['id']

    # Your haloscan thread data
    thr_data=threads[t_id]

    # A Disqus thread: it will be created if needed
    thread=ds.thread_by_identifier(forum,t_id,t_id)['thread']

    # Set the disqus thread data to match your blog
    ds.update_thread(forum, thread, url=thr_data[0], title=thr_data[1])


    # Now post all the comments in this thread
    for node in node.findall('comment'):
        dt=datetime.strptime(node.find('datetime').text[:19],'%Y-%m-%dT%H:%M:%S')
        name=node.find('name').text or 'Anonymous'
        email=node.find('email').text or ''
        uri=node.find('uri').text or ''
        text=node.find('text').text or 'No text'

        print '-'*80
        print 'Name:', name
        print 'Email:', email
        print 'Date:', dt
        print 'URL:', uri
        print
        print 'Text:'
        print text

        print ds.create_post(forum, thread, text, name, email,
                                   created_at=dt, author_url=uri)
        time.sleep(1)

def importComments(fname):
    tree=ElementTree.parse(fname)
    for node in tree.findall('thread'):
        importThread(node)


# Replace comments.xml with the file you downloaded from Haloscan
importComments('comments.xml')

Now, if we are lucky, you already have a nice and fully functioning collection of comments in your Disqus account, and you should be calm knowing you have not lost your data. Ready for the final step?

Comments

Comments powered by Disqus