Roll your own lazy loading collection

Today I realized (just again) how much I like the new AREL way of building queries. Especially the querie being actually executed at the moment you access the iterator (calling each for instance):

customers = Customer.where(:status => "approved").order("name DESC")
# No SQL fired so far

customers.each do |customer|
  puts "Name: #{customer.name}"
end
# Calling each, will fire the SQL - nice!

Ok, you all know that already, so why bothering you with that?!

Cause I realized (just again) how nice this works together with fragment caching in Rails and how I miss this feature when not working with ActiveRecord but doing some time consuming as a database query as well.

For today this other time consuming stuff was parsing a RSS feed downloaded from a Wordpress blog and display each retrieved article on each and every page. Of course this is something you wanna cache. And you want to cache both, getting and parsing the RSS and rendering each article.

In the old days I would implement cumbersome duplicate code in both the view and the controller to handle the caching correctly:

# app/controllers/application_controller.rb
class ApplicationController < ActionController::Base

  before_filter :fetch_blog_entries

  private
  
    def fetch_blog_entries
      unless fragment_exists?("blog_entries")
        @blog_entries = BlogEntry.find(:all)
      end
    end

end

And here the corresponding view code:

/ app/views/layouts/_blog_entries.html.haml
#blog-entries
  %h2
    Latest Blog Posts
  - cache "blog_entries", :expires_in => 1.hour do
    - @blog_entries.each do |blog_entry|
      #blog-entry
        %h3
          = link_to blog_entry.title, blog_entry.link

        = raw blog_entry.content
        %p{"id" => "creator-info"}
          written by #{blog_entry.creator}, #{time_ago_in_words(blog_entry.created_at)}

I'll admit in that simple case the duplication isn't a real pain, but hey, we all need some fun from time to time ;-) Due to this I went on to remove that nasty double cache check logic. And here we come back to where I started: I wanted something like the AREL behaviour for this one too! And since Ruby is some great language with a great support for closures this wouldn't be much of a problem, right? Right!

Here is the implementation of the BlogEntry#find-method before I added support to lazy load the RSS-feed:

# app/models/blog_entry.rb
class BlogEntry
  class <<self

    def find(*args)
      options = args.extract_options!

      feed = Nokogiri::XML(open(MyApp::Application.config.blog_feed_url))
      feed.xpath("//item").collect do |item|
        BlogEntry.new(extract_attributes_from_feed_item(item))
      end
    end

    def extract_attributes_from_feed_item(item)
      attributes = {}
      
      item.xpath("*/text()").each do |text|
        attribute_name = case text.parent.name
                         when "pubDate" : :created_at
                         when "encoded" : :content
                         else
                           text.parent.name.to_sym if accessible_attributes.include?(text.parent.name)
                         end
        attributes[attribute_name] = text.content if attribute_name.present?
      end

      return attributes
    end

  end
end

To get a lazy loaded collection of all the articles in the feed I just need a collection wrapper to store my closures:

# app/models/lazy_load_collection.rb
class LazyLoadCollection
  include Enumerable

  def initialize(lazy_collection, after_load_callback = nil)
    @lazy_collection     = lazy_collection
    @after_load_callback = after_load_callback.present? ? after_load_callback : lambda { |args| return args }
  end

  def each(&block)
    collection.each(&block)
  end

  private

    def collection
      @collection ||= @after_load_callback.call(@lazy_collection.call)
    end
end

As you can see I had to create some callback functionality to process the the raw collection before returning it to the caller. In this case it was creating BlogEntry instances. In the case there is no callback given I just pass the collection to a noop callback.

Backed by that new LazyLoadCollection class the BlogEntry#find method could be refactored to the following:

# app/models/blog_entry.rb
class BlogEntry
  class <<self

    def find(*args)
      options = args.extract_options!

      lazy_feed = lambda { Nokogiri::XML(open(MyApp::Application.config.blog_feed_url)) }
      create_blog_entries = lambda { |feed|
        feed.xpath("//item").collect do |item|
          BlogEntry.new(BlogEntry.extract_attributes_from_feed_item(item))
        end
      }

      lazy_load_collection = LazyLoadCollection.new lazy_feed, create_blog_entries
    end

    def extract_attributes_from_feed_item(item)
      attributes = {}
      
      item.xpath("*/text()").each do |text|
        attribute_name = case text.parent.name
                         when "pubDate" : :created_at
                         when "encoded" : :content
                         else
                           text.parent.name.to_sym if accessible_attributes.include?(text.parent.name)
                         end
        attributes[attribute_name] = text.content if attribute_name.present?
      end

      return attributes
    end

  end
end

And that's it! We got now a nice and clean little lazy loading collection, even with a callback feature included. All in less than 20 lines of code. Ruby is just amazing, isn't it ;-)

Deploying Node.js with Capistrano and Cluster

Currently we're building a backend system to support the customer care and sales department of our client. This week the users came up with the requirement to have real-time notifications of data changes. Such stuff generally sounds like a lot of fun and the chance to drop in some top-notch technology. Thus we decided to build this notification system on top of Node.js and Socket.IO.

The Node server acts as a simple relay server for JSON payloads sent from the backend system which is a Rails 3 app. The JSON payload is then broadcast to the connected Socket.IO clients (i.e. the user's browser). The implementation is really simple and not worth further mentioning it. But I'm a big fan of deploying a system as early as possible. This is especially true for systems introducing new technologies to an existing infrastructure which was eventually the case here. Due to this it is the deployment that challenged me and this is how I solved it.

Installing Node.js and npm on the server

First things first. Before we could even think about deploying our brand new real time notifcation system on a server, we need the basic software installed: Node and npm. Both was installed via Chef. We used the nodejs package from the Opscode cookbooks repository and added a dead simple npm recipe:

include_recipe "nodejs"

execute "install_npm" do
  command "curl http://npmjs.org/install.sh | sudo sh"
  user "root"
  group "root"
  not_if "which npm"
end

So far, so good – that was the easy part. Now for the real fun ...

Capistrano isn't just for Rails

Being a full time Rails developer for while now the word Deployment fires almost immediately an association with Capistrano. The main advantage of using Capistrano (or any other weapon of your choice) is consistency in usage. For me this is maybe one of the most important goal to achieve, from writing code itself down the way of managing a server.

With Capistrano the only task you have to care about is how to control your server process – which would be a Node.js process in this case. It turns out controlling such a bitch isn't as easy as controlling an Unicorn app server for instance. It means in detail there is actually no way to detach the Node server process but to use standard operating system methods:

nohup node server_script.js &

Unfortunatley, this doesn't really help. Of course, I could have used some other tools provided by the OS like init scripts or, as mentioned in this article, upstart. But I like the idea of keeping the deployment details for my apps at one place and not to spill one part in the app repo and the other in my Chef scripts. OK, but that demanded another way of controlling my little Node server.

Meet Cluster

On my quest for salvation I finally found the Cluster package which turned out to do the job just good enough. I still have to start the Node process through the above mentioned nohup construct but now I a PID-file was generated and a CLI interface for restarting and stopping the process was available. Even a status command is available but only with version 0.4.1 or higher of Node.js.

Wrapping it all together in the three Capistrano task deploy:start, deploy:stop and deploy:restart I ran in just another problem: Running the nohup command directly via Capistranos run method isn't working out: No error is raised but also there is no server up and running. I didn't fully understand the underlying problem but I assume it had something to do with the pseudo-TTY Capistrano allocates. The solution to the problem was as simple as encapsulating the start command in a separate Rake-task.

But 'nuff said, here is the corresponding Capistrano code:

namespace :deploy do
  desc "Restarting the Node.js process"
  task :restart, :roles => :app, :except => { :no_release => true } do
    run <<-SHELL
      if [ -f #{shared_path}/pids/master.pid ]; \
      then cd #{current_path} && node push_server.js restart; \
      fi && cd #{current_path} && rake server:start
    SHELL
  end

  desc "Starting the Node.js process"
  task :start, :roles => :app do
    run "cd #{current_path} && rake server:start"
  end

  desc "Stopping the app Node.js process"
  task :stop, :roles => :app do
    run <<-SHELL
      cd #{current_path} && node push_server.js shutdown && \
      rm #{shared_path}/pids/*.pid
    SHELL
  end
end

Bonus round: Managing NPM dependencies

As I said, I prefer keeping the app stuff together in one place. And being spoiled with Bundlers dependency management I wanted at least to define the npm dependencies for the app in the repository. I used a simple Rake task to get a script installing the dependencies for free:

desc "Install npm dependencies"
task :dependencies do

  npm_dependencies = [
    "express",
    "socket.io",
    "cluster"
  ]

  npm_dependencies.each do |dep|
    system "sudo npm install #{dep}"
  end
end

The only flaw of this solution is the use of sudo which is the new recommended way of installing npm packages since 0.3.0. This isn't really a problem on a local dev machine but it is quite annoying when your deployment user doesn't have any sudo privileges. Due to this first I have to update the code on the remote machine to have the new list of dependencies available and then to log into that machine to run the Rake task with a privileged user. On the other hand the current app is too simple and too small to justify a more sophisticated solution ;-)

Reminder: You definitely want to use a process monitoring solution like monit or god to take care of your pretty Node.js processes. But that is a completley different story.

Have a custom .irbrc for a Rails 3 project

If you are Rails console hacker like tisba and me you maybe find this really useful:

Today tisba had the problem of performing the same piece of code each time he runs rails console. Obviously this would be done in the ~/.irbrc. But wait! If that code in question would be project specific, which it eventually was, it would break every attempt to start an IRB session outside of that project. So we asked ourselves: "Wouldn't it be great to have a project specific .irbrc" - Yeah, it would!

Thanks to Rails great API it took me only a few minutes (actually less then writing up with this blog post ...) to came up with this simple method:

# File: config/application.rb
require File.expand_path('../boot', __FILE__)

require 'rails/all'

# If you have a Gemfile, require the gems listed there, including any gems
# you've limited to :test, :development, or :production.
Bundler.require(:default, Rails.env) if defined?(Bundler)

module MyApp
  class Application < Rails::Application
    # config stuff comes here

    def load_console(sandbox=false)
      super
      if File.exists?(project_specific_irbrc = File.join(Rails.root, ".irbrc"))
        puts "Loading project specific .irbrc ..."
        load(project_specific_irbrc)
      end
    end

  end
end

Just add it to your MyApp::Application class (the one defined in config/application.rb) and add your very own custom .irbrc in your Rails root directory.

Update: tisba mentioned it would be nice if the "Loading" line would be only printed if the file actually exists. I'm sure you gonna agree on that too. What you see now is the updated version of that snippet.

TextMate Tip: Fix the 'Run'-Command and Bundler

Recently, today to be precise, one team member got a Bundler::PathError while running any test cases from within TextMate. Running tests from the command line didn't caused any errors, though. Eventually this is really annoying as pressing ⌘-R is just too convenient to not use it. Digging deeper in the problem I remembered fixing that problem for me a while ago. Okay, that's great! So if you ever stumble across that problem here are the both snippets I used to fix this up:

#!/bin/sh

export RUBYLIB="$TM_BUNDLE_SUPPORT/RubyMate${RUBYLIB:+:$RUBYLIB}"
export TM_RUBY=$(type -p "${TM_RUBY:-ruby}")

cd "${TM_PROJECT_DIRECTORY:-$TM_DIRECTORY}"

"${TM_RUBY}" -KU -- "$TM_BUNDLE_SUPPORT/RubyMate/run_script.rb"

The one above is to fix the Run-Command. Just open the Bundle Editor and select Commands in the drop down (or just use the ⌃⌥⌘-C shortcut to jump right there). Find the Run-Command and replace the content with the one above.

Eh, you're right, I talked about "both snippets" so here is the other one to fix the Run Focused Unit Test-Command (which is quite handy, at least for me). You should find that command entry right below the Run-Command. Just replace its content with the following code.

#!/bin/sh

export RUBYLIB="$TM_BUNDLE_SUPPORT/RubyMate${RUBYLIB:+:$RUBYLIB}"
export TM_RUBY=$(type -p "${TM_RUBY:-ruby}")

cd "${TM_PROJECT_DIRECTORY:-$TM_DIRECTORY}"

export RUBYLIB="$TM_BUNDLE_SUPPORT/RubyMate${RUBYLIB:+:$RUBYLIB}"
export TM_RUBY=$(type -p "${TM_RUBY:-ruby}")

"${TM_RUBY}" -KU -- "$TM_BUNDLE_SUPPORT/RubyMate/run_script.rb" --name=

That's it. Now you're ready to go. Hope that helped. At least I will remember that one faster the next time, hopefully.

BTW: If you're using RVM be sure to have read their guide to integrate it with TextMate or you will just run right into the next problem ;-).

 
Fork me on GitHub