Sort by a list of IDs queried using MySQL's IN() function
Thursday, June 26th, 2008I run into this problem from time to time, and I've finally put enough energy into it that I want to commit it to memory through the blog. The most common occurrence of this in my world is when performing a search using something like Lucene or Sphinx. Here's what usually happens:
I perform a search using my engine of choice. Sphinx, for example, will give me back a hash of results formatted like this:
Array
(
[matches] => Array
(
[10546] => Array
(
[weight] => 3
[attrs] => Array
(
[feed_id] => 624
[published_at] => 1213713545
)
)
[14154] => Array
(
[weight] => 3
[attrs] => Array
(
[feed_id] => 583
[published_at] => 1213801410
)
)
...
)
[time] => 0.000
...
)
Since I do not store my full-text fields in my index, I then have to take the IDs from the matches array and query the database to get at my data. The query is simple:
SELECT * FROM items WHERE id IN (10546, 14154, …);
Unfortunately, that query will not necessarily return the rows in the order I provided. This is an easy fix if I'd like to sort my results on a column in my database, since I can just add a quick ORDER BY clause on say, the date the item was created. But if I searched the database for results sorted by relevance, the database really doesn't know anything about how to sort that way. I need a way to tell MySQL to sort the results in the order I've supplied my IDs, which leads us to at least two functions: FIND_IN_SET and FIELD.
Both functions work the same way: Given a string (or column) as the first argument, they return the 1-based index — or zero if not found — of the argument in either the list of additional arguments (FIELD) or the second argument, a comma-delimited string (FIND_IN_SET). For my purposes, then, my query looks like this:
# Using FIELD
SELECT * FROM items WHERE id IN (10546, 14154, …) ORDER BY FIELD(id, 10546, 14154, …);
# Using FIND_IN_SET
SELECT * FROM items WHERE id IN (10546, 14154, …) ORDER BY FIND_IN_SET(id, '10546,14154,…');
Both functions will return the same results, sorted the same way. I went so far as to profile both, too, just to make sure I was using the best function. Granted, when I profiled this using MySQL's built-in profiler, my query was pretty simple (as it should be, in my opinion). Your mileage may vary:
+----------+------------+------------------------------+ | Query_ID | Duration | Sorting result | Query | +----------+------------+------------------------------+ | 1 | 0.01572400 | 0.001129 | FIND_IN_SET | | 2 | 0.00241400 | 0.00127 | FIELD | | 3 | 0.00194700 | 0.00114 | FIND_IN_SET | | 4 | 0.00193100 | 0.001121 | FIELD | | 5 | 0.00192300 | 0.001116 | FIND_IN_SET | | 6 | 0.00192100 | 0.001101 | FIELD | +----------+------------+------------------------------+
Note that the query result is a hybrid between show profiles and show profile for query [id], and I've ripped out the actual query because all I care about is the sorting function being used (the query is the same otherwise). The duration on the whole was lower in each subsequent query because of MySQL's query cache, but since I really wasn't interested in that anyway — just the sorting duration — it doesn't really matter.
The profiling I did was ridiculously contrived, but it did go to show me, based on the miniscule differences, that it most likely will not matter which function I use (although FIND_IN_SET ultimately won out over my exhaustive six-query profile).