Infovore.Info

scouring the infosphere

Octopress Setup and Drupal Migration

I published my post on choosing Octopress for my static site generator today, so I thought I’d go into my setup and migration process a bit. I didn’t have many posts to migrate from my Drupal-based site, fewer than 10, so I could have done the migration by hand or just left those on the server and pointed a different subdomain at those. There wasn’t any page rank to be lost since they never had any traffic. But I was curious how the migration process would be, so I would ahead with the process.

Octopress is a static site generator based on Jekyll with some nice pre-configurations and customizations added. Jekyll is a Ruby project that is included as a gem by Octopress. The Jekyll project seems to have a small community around it, probably helped by the fact that it makes it easy to add GitHub project pages. Most of the recommendations I’ve seen for starting a Jekyll site recommend finding someone else’s project on GitHub and forking it so you don’t have to start from scratch. Octopress is designed to be forked and modified as the base for a site. It has a Rakefile with tasks designed to aid in initial configuration, page and post creation, and deployment. So it can be a time-saver when compared to cobbling together your own Jekyll-based site and workflow.

POW

The documents recommend installing something new to me, perhaps because I don’t do much with Ruby. It’s Pow, a web server based on Rack and built using Node. When you install Pow, it runs all the time and watches for a certain top-level domain. The default top-level domain is .dev. It’s really very simple. You symlink a directory that Pow can use with the name of the domain. I symlinked my modified Octopress to “infovore” in the ~/.pow directory and now http://infovore.dev is served out of that directory. Pretty nice. I don’t have to start up a server or remember what port to use. Also, it doesn’t really matter that it’s just a directory of static files. Pow serves static files out of a public directory in the directory it is looking at, which is exactly how Octopress is set up.

Jekyll Import From Drupal

In order to import my posts from my Drupal site, I had to do some reading on the Jekyll wiki to figure out how to import. The migration process wants to connect to your Drupal database. My database is on my Slicehost VPS. I considered doing what I used to do for development, export my production database and import that to a local MySQL database. But I figured it’s simpler just to connect directly to the remote server. Rather than use the MySQL installed with MAMP, I decided to install MySQL using Homebrew, then I installed the Sequel and MySQL Ruby gems as noted at the top of the Jekyll Drupal migrator file.

As a side note, I already had rvm (Ruby Version Manager) installed, so the gems were installed to my user-local config directory at ~/.rvm.

In order to connect to MySQL on the remote server using SSH, I needed to forward a local port to the remote port, using something like ssh -L 3306:localhost:3306 infovore.info. I was able to connect to the remote MySQL server and run a few queries to re-familiarize myself with the Drupal schema.

The current migration script is pretty basic and doesn’t get some of the metadata, such as tags. It also only creates a redirect page for the base “/node/{nodeId}” URLS, not for the rewritten “pretty” URLs. The current migration script also doesn’t get the markup format. I played around with both Textile and Markdown on the few posts I did on the original Drupal blog, and I wanted to pull them all out and use the correct file extension since Octopress/Jekyll can handle both Textile and Markdown.

The query went from this:

1
2
3
4
5
6
7
8
9
SELECT node.nid,
     node.title,
     node_revisions.body,
     node.created,
     node.status
FROM node,
   node_revisions
WHERE (node.type = 'blog' OR node.type = 'story')
    AND node.vid = node_revisions.vid

to this:

1
2
3
4
5
6
7
8
9
10
11
12
select node.nid,
     node.title,
     node_revisions.body,
     node.created,
     node.status,
     f.name as format,
     u.dst
from node
   join node_revisions on node.vid = node_revisions.vid
     join filter_formats f on node_revisions.format = f.format
     left join url_alias u on concat('node/', node.nid) = u.src
where (node.type = 'blog' OR node.type = 'story' OR node.type = 'article')

This required a bit of digging into the data to get the right structure. I also added another query to get the tags as I looped over each post. This could have taken quite a long time if I had many posts, but it would only be a one-time hit.

Once I had the data, I needed to change the YAML metadata output at the top of the posts. I changed the original migrator to output a published value since I had a few unpublished posts and the tags.

I also had to modify the location to save the files to since Octopress stores posts in a different location then the default Jekyll location.

I’m not saving my repo on GitHub, so I’m posting the modified file as a gist on GitHub. I added a Rake task to push to Amazon S3, but I’ll deal with that in a later post.

Comments