paolodona.com RSS

Oct
06
Mon

Rails bulk insert at the speed of light!

For a project I’m working on I needed to bulk insert a large amount of data into a MySql table from a rails app.

I usually rely on the ar-extensions plugin’s import feature but this time I really needed to be as quick as possible so I started looking for alternatives.

Given that I’m using MySql and that it provides a pretty good LOAD DATA INFILE statement, it seemed pretty stupid not to use it.

That’s why I’m intoducing a brand new Rails plugin that allows you to bulk insert records at the speed of light: import_with_load_data_in_file.

You can find it on Github:

http://github.com/paolodona/import_with_load_data_in_file

1. How to install it?

$ cd vendor/plugins
$ git clone git://github.com/paolodona/import_with_load_data_in_file.git

2. How to use it?

  • include the ImportWithLoadDataInFile module in your AR model, say MyModel.
  • call the MyModel.import_with_load_data_infile(cols, vals) method, as you would with ar-extensions import.

3. Example

# Table name: users
#
# id :integer(11) not null, primary key
# name :string(20)
# surname :string(32)
#
class User < ActiveRecord::Base
  # you need to include this module
  include ImportWithLoadDataInFile
end

cols = [:name, :surname]
vals = [["paolo", "dona"], ["james", "dean"]]
User.import_with_load_data_infile(cols, vals)

4. Disclaimer

This works only on MySql if the “LOAD DATA LOCAL INFILE” is enabled on both the client and the server. See the page Security Issues with LOAD DATA LOCAL for more info.

5. Benchmarks

Unofficial benchmarks say it’s up to 30% faster than ar-extensions import. There is still room for improvements as I could use unix named pipes instead of a physical file and string escapings could be cached.

Aug
04
Mon

I moved to London

Hi guys, this blog has been very quiet lately… I know.

What is keeping me busy at the moment is my relocation to London. Well, yes… after spending some time thinking about it (admittedly, not too much) Francesca and I decided to move and enjoy a new experience abroad.

This is something I used to dream of, and now it’s reality! There’s no plan to get back in the short term… I mean, it’s not a vacation, we want to live here for a while and see what happens, improving our poor English in the meanwhile.

If you want to see me in Italy, you now need to book me in advance :-)

If you happen to be in London, just give me a call and I’ll be more than happy to meet you!

If you’re wondering what’s going on at SeeSaw… well, just stay tuned and everything will be explained in a couple of weeks.

Hugs and kisses.

PS: I’ve tried to eat “Spagetti alla Carbonara” here and got badly sick… I promise I’ll never eat Italian food again!

Jul
06
Sun

Colorize RSpec stories output

UPDATE: This approach is no longer necessary as you can pass --colour to your story runner. You can still use the script for plugging in growl support.

If you’re into autotest and redgreen you probably miss your colored test output while running your Rspec Stories.

A standard story output looks like this:

You can pipe it to this simple colorize script and get:

... yellow pending messages, or:

... red failures.

That helps me spot where pendings and failures are when running large stories. PS: this script eats up lines if you’re running the debugger. Feel free to improve it and send back patches to me. PPS: the color stuff has been ripped off from redgreen.

This script plug nicely in growl support if found in your path.

Paolo

Jul
02
Wed

Check the database timezone at startup

If your application is dealing with different time zones and code which relies on them, it could be handy to ensure the database is using the time zone you expect it to.

drop something like this in a rails initializer:

#
# Check that that database is running in UTC
# and stop if it's not.
#
db_now = ActiveRecord::Base.connection.execute( 
    "select now() as now" ).fetch_hash['now']
utc_now = Time.now.utc.to_s(:db)

if db_now != utc_now
  raise LoadError, "ERROR: Database is not in UTC" 
end

btw: you should thank david, not me.

Jun
12
Thu

GTD: how to manage your todos and next actions with GMail

About a year ago I posted about a way to manage your todo list with GMail.

The concept is simple: label your emails as TODO and search through them. Next actions are just starred TODOs. Labelling and filtering your email with GMail is a snap, but in order to have cool “todo” and “next action” links available through the interface I had to use greasemonkey and saved searches.

Nowadays, with google updating GMail and me switching to osx/safari, that solution doesn’t work anymore. Luckily enough I noticed that GMail searches are now bookmarkable… so here I recap the steps for setting up your minimal todo list:

  1. create a new contact named todo whose email is your_account_name+todo@gmail.com (eg: paolo.dona+todo@gmail.com)
  2. create a label named todo
  3. create a filter matching to:(your_account_name+todo@gmail.com) and flag: Skip Inbox, Apply label “todo”.
  4. now search your mail for label:todo and bookmark the page as [todo]
  5. search your mail for label:todo is:starred and bookmark the page as [next action]

you should get something like (todo):

and (next action):

Have you noticed the [todo] and [next action] bookmarks I put in my Bookmarks Bar? I know this is not a solution that suits everyone’s needs, but it brings a couple of pros:

  • I can add filters that automatically populate my todo list from incoming emails.
  • I can play with labels and easily let todo items belong to projects, locations etc (GTD style).
  • I do not have to implement my own todo list manager as most rails programmers do :)

Happy GTDing!

May
21
Wed

Motorbike riding training in Adria

  • Do you love to ride your motorbike?
  • Are you willing to improve your handling skills?
  • Do you live in north-eastern Italy?

There’s a nice opportunity for you! check out this cool no-profit initiative from a couple of friends of mine!

May
05
Mon

From DVD to YouTube

As maybe someone of you already know, I have a close friend who is an opera singer. Yesterday I had to extract and convert a few excerpts of his DVD and put them on YouTube. Here the steps it took me:

  1. Convert the DVD to .mp4 file with Handbrake
  2. Cut out the excerpts with SimpleMovieX (it allows me to use precise pointcuts times, i had eg to extract from 1:50:23s to 1:52:25s) and save them.
  3. Import the excerpt in iMovie 08 and add titles and transitions. Normalize audio (DVD audio level is usually very low).
  4. Export resulting video to a YouTube friendly format (.m4v)
  5. Upload to YouTube.

You can see the final result on Domenico Menini’s YouTube Channel.

Are there better/faster ways to achieve the same result?

Feb
22
Fri

TextMate Filter Through Command

Sometimes I have to deal with messy html like this:

As you know, indenting it manually is a pain… but I just found out this cool TextMate feature, Filter Through Command:

You can filter your file through a shell command and substitute the content… here I’ve chosen the osx built-in tidy to clean up my html:

And voilà, your html gets pretty formatted in a snap…

I just need now to find out a way to automatically strip tidy’s comments at the top of my document… but the annoying part has been done. Pretty simple but clever stuff.

If you know better ways to do this… let me know!