Archive for the ‘Education’ Category

Sort by a list of IDs queried using MySQL's IN() function

Thursday, June 26th, 2008

I run into this problem from time to time, and I've finally put enough energy into it that I want to commit it to memory through the blog. The most common occurrence of this in my world is when performing a search using something like Lucene or Sphinx. Here's what usually happens:

I perform a search using my engine of choice. Sphinx, for example, will give me back a hash of results formatted like this:

  Array
  (
    [matches] => Array
    (
      [10546] => Array
      (
        [weight] => 3
        [attrs] => Array
        (
          [feed_id] => 624
          [published_at] => 1213713545
        )
      )
      [14154] => Array
      (
        [weight] => 3
        [attrs] => Array
        (
          [feed_id] => 583
          [published_at] => 1213801410
        )
      )
      ...
    )
    [time] => 0.000
    ...
  )

Since I do not store my full-text fields in my index, I then have to take the IDs from the matches array and query the database to get at my data. The query is simple:

SELECT * FROM items WHERE id IN (10546, 14154, …);

Unfortunately, that query will not necessarily return the rows in the order I provided. This is an easy fix if I'd like to sort my results on a column in my database, since I can just add a quick ORDER BY clause on say, the date the item was created. But if I searched the database for results sorted by relevance, the database really doesn't know anything about how to sort that way. I need a way to tell MySQL to sort the results in the order I've supplied my IDs, which leads us to at least two functions: FIND_IN_SET and FIELD.

Both functions work the same way: Given a string (or column) as the first argument, they return the 1-based index — or zero if not found — of the argument in either the list of additional arguments (FIELD) or the second argument, a comma-delimited string (FIND_IN_SET). For my purposes, then, my query looks like this:

# Using FIELD
SELECT * FROM items WHERE id IN (10546, 14154, …) ORDER BY FIELD(id, 10546, 14154, …);

# Using FIND_IN_SET
SELECT * FROM items WHERE id IN (10546, 14154, …) ORDER BY FIND_IN_SET(id, '10546,14154,…');

Both functions will return the same results, sorted the same way. I went so far as to profile both, too, just to make sure I was using the best function. Granted, when I profiled this using MySQL's built-in profiler, my query was pretty simple (as it should be, in my opinion). Your mileage may vary:

  +----------+------------+------------------------------+
  | Query_ID | Duration   | Sorting result | Query       |
  +----------+------------+------------------------------+
  |        1 | 0.01572400 | 0.001129       | FIND_IN_SET |
  |        2 | 0.00241400 | 0.00127        | FIELD       |
  |        3 | 0.00194700 | 0.00114        | FIND_IN_SET |
  |        4 | 0.00193100 | 0.001121       | FIELD       |
  |        5 | 0.00192300 | 0.001116       | FIND_IN_SET |
  |        6 | 0.00192100 | 0.001101       | FIELD       |
  +----------+------------+------------------------------+

Note that the query result is a hybrid between show profiles and show profile for query [id], and I've ripped out the actual query because all I care about is the sorting function being used (the query is the same otherwise). The duration on the whole was lower in each subsequent query because of MySQL's query cache, but since I really wasn't interested in that anyway — just the sorting duration — it doesn't really matter.

The profiling I did was ridiculously contrived, but it did go to show me, based on the miniscule differences, that it most likely will not matter which function I use (although FIND_IN_SET ultimately won out over my exhaustive six-query profile).

Make Rails' Auto-linker Accept Parentheses

Wednesday, May 21st, 2008

The Rails helper, auto_link is a handy way to scan a block of text, adding HTML links as they are found. I recognized an issue with its regular expression, however, when dealing with files commonly linked to on VerticalWire.

The links in question look like this:

http://www.website.com/assets/screenshotbundle(5-20-08).zip
Unfortunately, Rails was creating links that looked something like this:
http://www.website.com/assets/screenshotbundle(5-20-08).zip
While parentheses aren't necessarily commonplace in URLs, they're still acceptable; Web browsers won't balk when they're encountered. It's clear that we need to patch our helper to accept parentheses.

Delving into the auto_link method in ActionView::Helpers::TextHelper, it's working off a rather large regular expression defined as the constant, AUTO_LINK_RE. To safely change this constant, you need to do so within the initializer block in environment.rb. If you try to define it by putting a file in the config/initializers directory, you'll get a warning that the constant has already been defined. Here's my updated definition:

Rails::Initializer.run do |config|
  module ActionView
    module Helpers
      module TextHelper
        AUTO_LINK_RE = %r{
          (                          # leading text
            <\w+.*?>|                # leading HTML tag, or
            [^=!:'"/]|               # leading punctuation, or
            |                        # nothing
          )
          (
            (?:https?://)|           # protocol spec, or
            (?:www\.)                # www.*
          )
          (
            [-\w]+                   # subdomain or domain
            (?:\.[-\w]+)*            # remaining subdomains or domain
            (?::\d+)?                # port
            (?:/(?:(?:[~\w\+@%=\(\)-]|(?:[,.;:][^\s$]))+)?)* # path
            (?:\?[\w\+@%&=.;-]+)?    # query string
            (?:\#[\w\-]*)?           # trailing anchor
          )
          ([[:punct:]]|\s|<|$)       # trailing text
        }x
      end
    end
  end
end

Update: I've created a ticket to correct this behavior.

A case for methodizing your constants

Friday, April 4th, 2008

I was testing a bit of Rails code today and came across a spot where I wanted a unit test to make sure my method was going to properly handle an upper limit on associated objects. Think of it like this:

class Release < ActiveRecord::Base
  has_many :tags
  
  MAX_TAGS = 15
  
  def associate(tag)
    self.tags << tag unless self.tags.size == MAX_TAGS
  end
end

At 15 max tags, either I'm going to clutter up my tests with dummy data, or I'm going to drive myself insane creating needless fixtures. I use Mocha to stub out methods in my tests, so I immediately asked myself, "How can I stub a constant?" You can go ahead and alter the constant in your test case using const_set, but that requires that you set it back at the end of your test. Another option is to methodize the constant and stub it out like you would any other method:

class Release < ActiveRecord::Base
  has_many :tags
  
  MAX_TAGS = 15
  
  def self.max_tags
    MAX_TAGS
  end
  
  def associate(tag)
    self.tags << tag unless self.tags.size == Release.max_tags
  end
end

Now, in your test case, you can just stub out your new class method and, provided you're now accessing the constant in your original method through its new wrapper method, you're all set:

class ReleaseTest < ActiveSupport::TestCase
  def test_associate
    Release.stubs(:max_tags).returns(1)
    
    assert_no_difference 'release(:first).tags.reload.size' do
      release(:first).associate(tags(:new))
    end
  end
end

Start Simple, End With a Flourish

Thursday, April 3rd, 2008

As I continue developing new sites, I find myself falling into development patterns. I'm trying to foster some of them, but there's one habit in particular I'm trying to break: I realize that when I began my most recent project, it was littered with JavaScript-y goodness in an effort to get the "wow" factor from users. As time has worn on, however, I've found myself ripping out more and more of this chrome, to the point where the majority of my site now functions as a "normal" Web application.

One of the first pieces of AJAX I added to this project was the ability to edit a "media bar" while still browsing the page it was on. You'd click an "edit" link in a vertically floated DIV, and it would scale upwards in sort of a light box effect, while fading out its contents and fading in controls to edit those contents. Now, don't get me wrong. This control looked cool (take my word for it). I still have the code lying around in an old Subversion check-in, because I'm a bit proud of it, actually.

The problem with my editable media bar was that ultimately, it was too stylish for its own good. The novelty wore off after a few uses, and then I was stuck with a difficult-to-maintain piece of JavaScript that never seemed to fit with the rest of the site. But I had spent like three days on it! It was so cool! Which is why it had to go: I had spent so much time on it that I had tunnel vision, and when our designer actually got around to seeing it, he didn't even make a peep; it wasn't even a blip on his radar.

Something I had to realize was that if I really wanted this element on the site, I had to also be comfortable taking it away, and reevaluating it. It may come back in a similar form in the future, but for a version 1.0 release, it wasn't worth the time I spent creating it. I've reverted to a standard link to send users to the media bar edit page, and its functionality is as good, if not better, than my initial flashy solution.

So I'm going to be starting simple on my next project. JavaScript and AJAX can wait. It has its place on The Web, but for most everyday user interface patterns, it's just overkill, and the benefits do not outweigh the development time.

Rails Cookie Testing Notes

Wednesday, January 16th, 2008

I ran into a few gotchas I wanted to document about testing and cookies in Rails while developing a user persistence system for a site. The implementation isn't really important, but the nuances of cookie accessing is what I really wanted to focus on.

In a functional test (and possibly an integration test, but I don't really touch those), you can set pre-populated cookie variables like so:

def test_cookie
  @request.cookies[:persist] = 'secret'
  get :index
  assert_response :success
end

The above code will set a cookie named "persist" to the value "secret." The subsequent request (get :index in this case) will have that cookie available to it during the action's lifecycle.

One thing to note, however, is that the cookies hash in a functional test is populated after a request has been made, and is only populated with cookies that were set during the request. What does this mean for us? It means this code will not work (assuming the index page does not set any cookies for us):

def test_cookie
  @request.cookies[:persist] = 'secret'
  get :index
  assert_equal 'secret', cookies['persist'] # This will fail as the cookies hash is empty
end

The reason the cookies hash is empty is because no cookies were set during the call to the index action. Think of the cookies hash as you would the assigns hash: It's only filled if you fill it during the request.

Another issue I ran into was testing for the deletion of a cookie. It's a simple fix, but it's something none-the-less:

def test_cookie
  # This action deletes the "persist" cookie via cookies.delete(:persist)
  delete :destroy
  assert_nil cookies['persist'] # This will fail as the value is actually empty, not nil
  assert cookies['persist'].empty? # This will succeed
end

Finally, unlike the assigns hash, you cannot access the cookies hash after a request using a symbol. You must identify the cookie's key by string, even if you set the cookie's key in your action as a symbol. There's a patch for this (Ticket #5924) but nothing is really being done about it.

def test_cookie
  # This action sets the "persist" cookie to '1'
  post :create
  assert_equal '1', cookies[:persist] # This will fail
  assert_equal '1', cookies['persist'] # This will succeed
end

So there you go. A few issues you might run into when testing with cookies.

Reconnect your BackgrounDRb database sessions

Sunday, November 11th, 2007

I ran into a problem with my BackgrounDRb scheduler last night: The MySQL server had to be restarted, and the scheduler got confused because its connection had gone away. By default, a BackgrounDRb worker will not reconnect to its database unless you explicitly tell it to do so. Here's how you can do that.

(more…)

BackgrounDRb Notes

Sunday, August 12th, 2007

UPDATE: Since this post, a new version of BackgrounDRb has been released, and as such, some, if not most or all of this information should be considered outdated. Unfortunately, I haven't had a chance to check out the new version, but I'll update this page if I end up installing it.

BackgrounDRb is a great Ruby-based scheduler, but its documentation isn't quite as mature as I'd like, so I wanted to jot down some notes to help anyone who's been trying (and possibly failing) to get everything working correctly.

Note #1: You may create a file called config/backgroundrb_schedules.yml to set up default schedules when your BackgrounDRb service starts. This is not to be confused with the -c command-line switch, which is meant to point to the location of your backgroundrb.yml file (its default location is config/backgroundrb.yml).

Note #1a: If you choose to create the above-mentioned YAML file, its format might confuse you a bit, if you're accustomed to Rails' database.yml file or its fixture YAML files: You must prepend each line (except schedule names) with a colon, since BackgrounDRb accesses its YAML variables through symbols. For example:

# The following line should NOT be prepended with a colon (schedule name)
my_worker:
  # Each value inside our my_worker key SHOULD be prepended with a colon
  :class: MyWorker
  :job_key: :my_worker_key

Note #2: If you would like a schedule to repeat at a certain interval, include the :repeat_interval key beneath your :trigger_args parameter (you can't use this parameter with the :cron_trigger trigger type because repeat intervals are built into the cron structure). If you're using a YAML configuration file, it will look something like this:

my_worker:
  :class: MyWorker
  :job_key: :my_worker_key
  :worker_method: :do_work
  :trigger_args:
  :repeat_interval: 2.minutes

The :repeat_interval parameter accepts an integer, in seconds. Since BackgroundDRb is a Rails plugin, you can freely use Rails' time methods (seconds(), minutes(), hours(), etc.) to better format your number. The current (as of August 13, 2007) BackgrounDRb documentation describes :repeat_interval at times, and :interval other times. Setting the :interval key will do nothing.

Note #3: Any call to the logger object from within your workers gets put, by default, into logs/backgroundrb.log. All server calls (startup messages and trigger executions) get logged to logs/backgroundrb_server.log. This may seem trivial, but you could be racking your brain wondering why your worker isn't properly logging its responses. It most likely is, but maybe not in the log file at which you are looking.

Note #4: The BackgrounDRb documentation chooses to access worker classes by their underscored symbol variants (e.g., the MyWorker class would be referenced as :my_worker). You can choose to access your workers by their true class names (e.g., MyWorker) for verbosity's sake.I may add more notes as I come across other gotchas. Check back occasionally for updates.

Singleton can't be dumped

Sunday, August 12th, 2007

I ran into a new error today, and after digging around my code for a bit, I figured out what was going on. I have an after_filter in my ApplicationController called set_last_url. It came from the session handling code in active_record_store, and its contents look like this:

class ApplicationController < ActionController::Base
  after_filter :set_last_url

  def set_last_url
    session[:last_url] = params
  end
end

For normal purposes, this method should be fine. Its purpose is to capture the last place you visited, and redirect you back after logging in or creating a new account. However, the params hash doesn't always contain information you'd like to store in the session object. In my case, it was an uploaded file. Here is the error along with a snippet of the backtrace:

Processing ItemsController#create (for 127.0.0.1 at 2007-08-12 16:09:38) [POST]
Session ID: 7797d006f6d7668e81a16564043e064f
Parameters: {"action"=>"create", "list_id"=>"1", "controller"=>"items", 
             "item"=>{"name"=>"Futurama", "price"=>"1.00", "url"=>"http://www.futurama.com", 
             "description"=>"DVD Set", "image"=>#}}

TypeError (singleton can't be dumped):
    /usr/local/lib/ruby/gems/1.8/gems/actionpack-1.13.3/lib/action_controller/session/active_record_store.rb:83:in `dump'
    /usr/local/lib/ruby/gems/1.8/gems/actionpack-1.13.3/lib/action_controller/session/active_record_store.rb:83:in `marshal'
    /usr/local/lib/ruby/gems/1.8/gems/actionpack-1.13.3/lib/action_controller/session/active_record_store.rb:135:in `marshal_data!'
    …

I had two options that became readily apparent: I could change my code such that I only store the controller, action, and possibly ID parameters into the session object. Unfortunately, sometimes I add parameters to the end of my query string, and I don't want to omit them from the redirect URL. The other option was to act on the request method, which is what I ended up doing. In practice, I realized I didn't need to set the last URL after calls that were anything except GET, so I edited my filter definition a bit:

class ApplicationController < ActionController::Base
  after_filter :set_last_url

  def set_last_url
    session[:last_url] = params if request.get?
  end
end

No more error!

Save vs. Save!

Wednesday, August 8th, 2007

When learning about ActiveRecord objects, two methods usually taught are ActiveRecord::Base#save and ActiveRecord::Base#create. For beginners–especially those new to Ruby as a language–these methods are fine and unobtrusive. I'm finding myself using their counterparts, ActiveRecord::Base#save! and ActiveRecord::Base#create!, respectively as time goes on, for a few reasons.

First, a bit of an explanation of what the exclamation mark at the end of a method tells you, since it's generally different in Ruby and in ActiveRecord. In general, when talking about built-in Ruby methods, the exclamation mark tells you, "This method will return the object it was passed, edited in place." Take this example:

>> str = ' A phrase with spaces '
=> " A phrase with spaces "
>> str.strip
=> "A phrase with spaces"
>> str
=> " A phrase with spaces "
>> str.strip!
=> "A phrase with spaces"
>> str
=> "A phrase with spaces"

The String#strip method removes leading and trailing spaces from a string and returns the edited string, without editing the string object itself. Conversely, the String#strip! method also removes leading and trailing spaces from a string, but it also edits the string object itself, then returns it. You can see that subsequently displaying the str object after the call to str.strip! returns the stripped string; the string object has been edited and saved, as opposed to just edited and returned, with str.strip.

ActiveRecord handles the exclamation mark differently than Ruby, generally speaking. When we think of methods with an exclamation mark at the end in methods such as ActiveRecord::Base#save! and ActiveRecord::Base#create!, what we are told is, "Try to save or create this ActiveRecord object, but if you can't for some reason (usually validation), raise an exception." In contrast, the ActiveRecord::Base#save and ActiveRecord::Base#create methods will return true if the object was successfully saved or created, and false if not.

Why would you choose to have ActiveRecord raise an exception when trying to create an object? Here are a couple reasons:

  • Using a Rails plugin like exception_notification, you can be alerted by email if an object was not saved or created for some reason. If you have code that's silently returning false when trying to save an object, you might not notice an issue with your code. This can lead to database inconsistencies and unhappy users.
  • When testing, sometimes you have a method handle a lengthy process, such as importing a record from an external source, massaging the data, and creating various objects in the database to correspond with the incoming data. If any one of these things fails, your encapsulating method might not let you know, but if you're raising exceptions, you can do something like this:
assert_nothing_raised do
  Person.import_from_external_source
end

If anything fails, you'll get a test failure and you can delve deeper into your code to figure out what's going wrong.

I use ActiveRecord::Base#save! and ActiveRecord::Base#create! when I want to make sure an object will be created, and their non-exclaiming counterparts when it's not critical that the records be updated or created (e.g., if we're importing data for an existing person and an "Email" record already exists for the person for the specified email address, we can just test for failure based on our validations and move on, without re-importing the record).

Rescue your ActionView helpers

Friday, July 27th, 2007

Exceptions help us check for specific places where our code might throw a specific error, but they're also extremely handy when used in the general sense, in combination with the rescue keyword. In views, or any place you regularly throw around ActiveRecord associations, there is a lot of error checking to be done that might not always get caught. Rescuing a general exception in these cases can be very helpful. My problem was view code…

For a long while, I was plagued with view code that might fail if the associations tied to a particular object on which I was working were nil. Take this code, for example:

<!– people/index.rhtml –>
<%= h @person.name %><br />
<%= h @person.company.name %><br />
<%= h @person.company.ceo.phone %><br />

In a perfect world, all of these associations will exist, and the code will spit out exactly what it needs to. Unfortunately, sometimes, through data validation errors or in cases where the data just doesn't come over (say, when working with external systems), you're going to get exceptions. If the @person object exists, but the @person.company association does not, the second and third lines will fail. If the @person.company object exists but the @person.company.ceo association does not, the third line will fail. How do we check against this? Here's one way:

<!– people/index.rhtml –>
<%= h @person.name %><br />
<%= h @person.company.name if @person.company %><br />
<%= h @person.company.ceo.phone if @person.company && @person.company.ceo %><br />

As you might imagine, this code can get unmanageable very quickly. Here's a better way, through the use of helpers and the rescue keyword:

# people_helper.rb
module PeopleHelper
  def show_company_name(person)
    return h(person.company.name)
  rescue
    return ''
  end

  def show_ceo_phone(person)
    return h(person.company.ceo.phone)
  rescue
    return ''
  end
end
<!– people/index.rhtml –>
<%= @person.name %><br />
<%= show_company_name @person %><br />
<%= show_ceo_phone @person %><br />

Now, if either the company or the ceo associations do not exist, Ruby will throw an exception. This exception is rescued by our helper methods and an empty string will be returned. Our view is cleaner and more maintainable, and we can rest easy knowing that if for some reason we have a nil object somewhere in our expression, we'll safely catch it.