Adding indexes with Padrino migrations

I really like the Padrino web framework. Its simplicity and layer of just enough above Sinatra appeals to me over the burgeoning complexity of Rails of late.

As my applications have become more complex though, the lack of documentation and good examples have become an increasing issue. I’m intending to write a few blog posts on common, yet more complex things, you need to do in Padrino to use it as a Rails replacement. Hopefully it can increase uptake and help other people with issues to solve some common gaps.

My app was getting to the size where database indexing was becoming a necessity. Could I throw those in using padrino migrations without resorting to raw SQL statements in the migrations? I couldn’t tell. Googling gave me no love, but since I was using ActiveRecord one would assume it was supported. I couldn’t tell from the source code, so after a quick trip to #padrino on IRC and a nice chat with @wikimatze (where we both shrugged), I took a pg_dump and then started some cowboy database experimenting.

Short story: It works and is supported, but it’s slightly different if you’re coming from indexes in Rails migrations. Here’s what you need to know.

A basic column index

So, let’s dive right into an example. I have a Species table, chock full of a distinct 6,379 animals. In general, there are a lot of lookups via the name, which is a unique combination of the genus and species. So, the Green And Black Poison Frog, one of the famous curare frogs, is Dendrobates auratus uniquely in the database. And that’s what I need to create an index on.

So, I started with the following:

🚀  ~ padrino gen migration AddNameIndexToSpecies

which gave me a nice blank migration file which just self.up and self.down, which I modified with the change_table. The important thing to note is that you can’t use the Rails migration syntax of add_index. That just throws an error. You need to have it look a little something like this:

class AddNameIndexToSpecies < ActiveRecord::Migration
  def self.up
    change_table :species do |t|
      t.index :name, :unique => true
    end
  end

  def self.down
    change_table :species do |t|
      t.remove_index :name
    end
  end
end

I wanted to make sure the index was unique since, at least in postgres, this will also add in an additional constraint preventing anyone inserting a duplicate record on the name which, with nearly 6500 species, is a bit of a concern. Plus it also is a double check on the data validations in the models.

Running padrino rake ar:migrate actually gives you a nice B-tree index on the name column in the species table.

   INFO -  Migrating to AddNameIndexToSpecies (37)
  DEBUG -   (0.2ms)  BEGIN
==  AddNameIndexToSpecies: migrating =========================================
-- change_table(:species)
  DEBUG -   (125.8ms)  CREATE UNIQUE INDEX "index_species_on_name" ON "species
   -> 0.1285s
==  AddNameIndexToSpecies: migrated (0.1286s) ================================

  DEBUG -   (1.3ms)  INSERT INTO "schema_migrations" ("version") VALUES ('37')
  DEBUG -   (2.1ms)  COMMIT
  DEBUG -   (0.3ms)  SELECT "schema_migrations"."version" FROM "schema_migrati

You end up getting this output and bringing up the database console with psql you can get confirmation that there is actually a new index on the species column.

aark_dev-# \d species
                                              Table "public.species"
            Column            |            Type             |
------------------------------+-----------------------------+-----------------
 id                           | integer                     | not null default
 name                         | character varying(255)      |
 common_name                  | text                        |
 analog                       | text                        |
 has_phylogenetic_study       | character varying(255)      |
 order                        | character varying(255)      |
 family                       | character varying(255)      |
 genus                        | character varying(255)      |
 species                      | character varying(255)      |
 ex_situ_research_needs_notes | text                        |
 created_at                   | timestamp without time zone | not null
 updated_at                   | timestamp without time zone | not null
Indexes:
    "species_pkey" PRIMARY KEY, btree (id)
    "index_species_on_name" UNIQUE, btree (name)

Done. The speedup in my app was actually noticeable as well as confirmed by the difference in executing a page load. When people are looking at the species table, the main way they use the filtering is by name search.

Compound Indices

So, that’s pretty straightforward if you’re at all used to putting in indices from Rails. How about a compound index? One that ends up using the combination of fields to look things up. In my database, since I have both assessments and conservation programs, both of which are a unique combo organization and species (and start date technically), and quite often they’re being looked up to determine if a species is in captivity or in need of help in establishing a “bring back from the brink” program in terms of them being critically endangered. Maybe that’s too esoteric though. Let’s pick one that everyone can deal with, the Accounts or Users table.

In this app, we’ve just gone with peoples’ emails as their primary identifier. Since the app is also international, we’ve also allowed people to identify themselves with two fields, first_names and last_names in order to avoid the various issues around the way people describe themselves outside of the English speaking world. So, let’s use that. In particular, since we have a full_name function in the app which parses those two things together, an index on the both of them would probably be quite useful. How do you get an index on two columns? Again, erring on the side of safety, I issue the command line

🚀  ~ padrino gen migration AddIndexesToAccounts

and then edit the resulting migration file to look a little something like this:

class AddIndexesToAccounts < ActiveRecord::Migration
  def self.up
    change_table :accounts do |t|
      t.index :email, :unique => true
      t.index [:first_names, :last_names]
    end
  end

  def self.down
    change_table :accounts do |t|
      t.remove_index :email
      t.remove_index :first_names_and_last_names
    end
  end
end

To get compound indexes, you simply run the target columns together as an array of symbols. Running the migration, you get the surprising index_accounts_on_first_names_and_last_names which I have to admit, I didn’t see coming, so I had to alter the remove portion of this file to deal with that.

Index Types

So far though, this is only giving you the same type of index, the B-tree. It’s an excellent general purpose index for most searching and comparison, but particularly databases like PostgresSQL have other types of indexes available to them which assist things like array searching and specific applications such as full text searching.

How can you use them from Padrino? Well, at time of writing, as near as I can tell from diving down into 0.11.4, the dependency ceiling on ActiveRecord (I’m currently at 3.2.14) and below 4.0 means you can’t just pass a type of index in the migration unfortunately.

However, as soon as 0.12 has support for ActiveRecord 4.0+ I believe you should be able to use the following to be able to generate either a GIST or GIN type index to help your fulltext search in postgres or similar use case.

class AddNameIndexToSpecies < ActiveRecord::Migration
  def self.up
    change_table :species do |t|
      t.index :name, :unique => true, :using => :gin
    end
  end

  def self.down
    change_table :species do |t|
      t.remove_index :name
    end
  end
end

We’ll have to see if that bears out to be true on the 0.12 release.

Most likely though, you’re probably fine with the B-tree index for most of what you want to do. Unless you know what you’re doing, it’s probably good to stick with that. Experiment at will though.

In any case, being able to add indexes from the migrations is a great, if undocumented, feature in Padrino so I’m hoping this closes the gap somewhat. I’ll add similar notes to the migrations documentation as the padrino-core group pushes forward with revising documentation on the way to 0.12.