Picking the right search functionality image

Picking the right search functionality

Author: Charli

Nov 17, 2019 - Reading time: 11 min

Search functionality is often a core part of a webapp. In Ruby on Rails you have several options when you want to add this kind of functionality. In this short article I'm tried out a few of them.
I have implemented searchkick with elasticsearch in earlier projects, but it is often overkill, especially when you want to run the app on a 5$ droplet on digital ocean.
So, i was looking to see what other options there were and decided to try some of them out.

When I try out new thing I often spin up a small rails app and get my hands dirty right away, so here it goes.
First i made a generic app with just an article model with a title and content.
Then I implemented each search on their own separate branch.
Now I can test which one i like the better inside the same app.

The ones I wanted to try out is:

https://github.com/Casecommons/pg_search
https://github.com/activerecord-hackery/ransack
https://github.com/pat/thinking-sphinx
https://github.com/sunspot/sunspot
https://github.com/ankane/searchkick

Starting with pg_search

In my example i'll use the “scope” version, which is the lightweight implementation. The reason why I choose the simple version is mainly for simplicity, but also because I just wanted to search for records inside a single model or table. The advanced feature will allow you to search in multiple models.
You can read more about that in the docs.

It was actually a very fast implementation.

I added the gem:

#Gemfile
gem 'pg_search'

And included pg in the model (articles)

#article.rb
class Article < ApplicationRecord
  include PgSearch
end

Then I added the search configuration:

#article.rb
class Article < ApplicationRecord
  include PgSearch
  pg_search_scope :search,
    against: { title: 'A', body: 'B' },
      using: {
        tsearch: {
          prefix: true,
          any_word: true,
          dictionary: 'english'
        }
      }
end

I search for the 2 columns title and body, but the ‘A’ and ‘B’ is to provide some ranking between the 2. Then tsearch reference using the text search. The dictionary can be set to english or simple, I choose the english version in order to take advantage of the “stemming” functionality. So if someone searches for “Products”, the result will return objects with the “product” in as well.
Setting prefix to true will add :* to the end of your queries.
Any_word is also set to true, because I want the search to return all records that contains any of the words that the user searches for. If it's not set to true, then it will consider the entire input text as one text string.

On the article index page I added the search form.

#index.html.erb
  <%= form_tag(articles_path, method: "get") do %>
    <%= text_field_tag :search, nil, placeholder: "Search articles", class: "form-control" %>
    <%= submit_tag "", style: "display: none;" %>
  <% end %>

Display none is just to execute the search when you hit enter, instead of clicking the search button.
And finally the index method of the articles controller:

#articles_controller.rb
  def index
    if params[:search].present?
      @articles = Article.search(params[:search])
    else
      @articles = Article.all
    end
  end

The index method contains a simple if statement where it populates the instance variable @articles with either all articles or the result of the .search method with the user inputted params.
The .search method comes from pg_search.

So, how is the output?
I checked in the rails server after I made a test-search in order to have a look at the query.

SELECT "articles".* FROM "articles" INNER JOIN (SELECT "articles"."id" AS pg_search_id, (ts_rank((setweight(to_tsvector('simple', coalesce("articles"."title"::text, '')), 'A') || setweight(to_tsvector('simple', coalesce("articles"."body"::text, '')), 'B')), (to_tsquery('simple', ''' ' || 'computer' || ' ''' || ':*')), 0)) AS rank FROM "articles" WHERE (((setweight(to_tsvector('simple', coalesce("articles"."title"::text, '')), 'A') || setweight(to_tsvector('simple', coalesce("articles"."body"::text, '')), 'B')) @@ (to_tsquery('simple', ''' ' || 'computer' || ' ''' || ':*'))))) AS pg_search_906c7ff47cd7c682246c27 ON "articles"."id" = pg_search_906c7ff47cd7c682246c27.pg_search_id ORDER BY pg_search_906c7ff47cd7c682246c27.rank DESC, "articles"."id" ASC

I expected it to be something like:
SELECT * FROM “articles” WHERE (”title” LIKE %computer% OR “body” LIKE %computer%)
But what really happens is an inner join that is using ranking and the tsvector to both figure out the return result, but also rank it in the most relevant order. It is clear that the title is prioritized higher than the body.

One issue i faced during the implementation of this gem was the option to highlight the keyword in the result text. It was actually working fine, but when showing the result in the view it will concatenate the output. And since the result is a combination of the title and body of the article, it will display them both in 1 string. This is annoying, because I want to add the title as a clickable link that sends the user to the showpage. It is still possible, but then if I show some of the body text (or result text), the string will be concatenated and will have both title and body. So it will look something like this:
Title of article
Title of article + body of article
But that is a minor thing that can be fixed.

Let’s have a look at the search-time for a normal query.
In order to get a result we can work with, I added a few more records in the seedsfile.
Just did a 1000.times do block for all of them.
Before the additional records the search time was: Article Load (43.8ms)

Now after the 1000 extra records it is: Article Load (13253.3ms)

That is pretty crazy and we have to deal with this before the database grows.
So what options do we have?

This is not a huge app, but a natural next step would be to add a index on the searchable columns. The more records the search is going through, the slower the search will be … naturally.
Depending on the use-case it can be a problem if we have, lets say 1000 records.
One option is to limit the search to the title. That isn’t a viable solution in this case, but adding a index to the columns would provide the search speed.
We need to have index on the body and title column. (should be GIN)
It is a bit harder than i thought it would be.
Read:
https://github.com/Casecommons/pg_search/wiki/Building-indexes
http://semaphoreci.com/blog/2017/06/21/faster-rails-indexing-large-database-tables.html
https://about.gitlab.com/2016/03/18/fast-search-using-postgresql-trigram-indexes/

Next up is Ransack search

Ransack do have 2 options as well, an advanced and a simple version. It is very similar to pg search, where the advance version lets you search across multiple models and the simple version is limited to 1 model. In our case the simple version will do.

I started by adding the gem

#Gemfile
gem 'ransack', github: 'activerecord-hackery/ransack'

There are lots of configuration options for this gem, both a simple and advanced mode. For now I choose the simple mode, since we are only searching in 1 model.

I added the search to the index action on the articles controller.

#articles_controller.rb
  def index
    @q = Article.ransack(params[:q])
    @articles = @q.result(distinct: true)
  end

And then the search form on the index page

#index.html.erb
<h3>Search</h3>
  <div class="form-group">
    <%= search_form_for @q do |f| %>
      <%= f.label :title_or_body_cont %>
      <%= f.search_field :title_or_body_cont, class: 'form-control' %>
      <%= f.submit "Search", class: 'btn btn-default' %>
    <% end %>
  </div>


<h1>Article index page</h1>

<table>
  <tr>
    <td><%= sort_link @q, :title, "Title" %></td>
    <td><%= sort_link @q, :body, "body" %></td>
  </tr>
  <% @articles.each do |article| %>
    <tr>
      <td><%= article.title %></td>
      <td><%= article.body %></td>
    </tr>
  <% end %>
</table>

And it worked. Out of the box like that.
It is something like a 5-minute implementation.

Checking the output on a search, gives me the following:

SELECT DISTINCT "articles".* FROM "articles" WHERE ("articles"."title" ILIKE '%computer%' OR "articles"."body" ILIKE '%computer%')

It is pretty straight forward.
iLIKE is to make case insensitive
But like with the pg_search, let's take a look at the search time.
With the normal 25 articles, the query is very fast:
Article Load (7.3ms)
But adding the 1000.times do block will give a more accurate indication of the search time.
With the extra records added the search-time is now at: Article Load (697.9ms)

Comparison of ransack and pg_search

Compared to pg_search the out-of-the-box implementation will return results faster.
Comparing the 2 without the extra records: 7.3ms for ransack and 43.8ms for pg_search.
And the big search 697.9ms for ransack and 13253.3ms for pg. It

There is a huge difference, and in order to understand why, we need to take a look at the queries generated.

Compare the 2 queries.
Ransack:

SELECT DISTINCT "articles".* FROM "articles" WHERE ("articles"."title" ILIKE '%computer%' OR "articles"."body" ILIKE '%computer%')

Pg_search:

SELECT "articles".* FROM "articles" INNER JOIN (SELECT "articles"."id" AS pg_search_id, (ts_rank((setweight(to_tsvector('simple', coalesce("articles"."title"::text, '')), 'A') || setweight(to_tsvector('simple', coalesce("articles"."body"::text, '')), 'B')), (to_tsquery('simple', ''' ' || 'computer' || ' ''' || ':*')), 0)) AS rank FROM "articles" WHERE (((setweight(to_tsvector('simple', coalesce("articles"."title"::text, '')), 'A') || setweight(to_tsvector('simple', coalesce("articles"."body"::text, '')), 'B')) @@ (to_tsquery('simple', ''' ' || 'computer' || ' ''' || ':*'))))) AS pg_search_906c7ff47cd7c682246c27 ON "articles"."id" = pg_search_906c7ff47cd7c682246c27.pg_search_id ORDER BY pg_search_906c7ff47cd7c682246c27.rank DESC, "articles"."id" ASC

Adding index to pg_search
Adding an index to the search is a bit complicated.
I did bring down the search from 13.253.3ms to 692.0ms. So, this cut the search time in half.
I just followed this article: https://robots.thoughtbot.com/optimizing-full-text-search-with-postgres-tsvector-columns-and-triggers

Thinking Sphinix

Not going to test this one since its using mysql as database, instead of postgres, which is not supported on heroku, but it is supported on digital ocean. I will update the article with a test of this one at a later time.

Sunpot Solr

Actually I want to try this one too, but it requires a server, so this will be another day. When I do an article with elastic search (searchkick), then i'll compare it to sunspot.

Conclusion

It is hard to choose and the 2 I ended up with testing are very similar.
My impression is that pg_search does have a few more configuration options, like weight, highlight, prefix and dictionary, which is language and it is a rather important one for this current scenario.
The app I am building will contain articles in a different language, so english is not the default.
There is an option to add the selected language in both gems, but after checking the documentation i realised that in ransack the language I want to use is not support.
Or it is supported through ransack:local, but not available through https://www.localeapp.com/projects/2999

So I picked pg_search for this app and I can now move on with knowledge of the different search-options the gems provide. I will in the future take a look at the solr and elasticsearch options, just to be prepared if i eventually will make a bigger project with a more advanced search.

Regarding the index, I decided to avoid adding extra complexity here at first, and then i’ll monitor the app, usage and performance and then add the index later, if needed.
It is not a problem to add in production, since i’ll just add the column and then run the
Article.find_each(&:touch)
Command in the console in production after i added the index configuration.