Search code examples
ruby-on-railssolrsunspot

Nested searching in SunSpot Solr


I am trying to implement a Solr based search for a message thread. Each message can have many replies(The replies can be ONLY one level deep.). I want to retrieve the parent messages with content matching the search key OR replies matching the search key.

E.g:

Hello Jack
  Hello Janice
  How are you?
  ..

I am Janice
  How are you?

Welcome to the Jungle
  Nothing better to do.

Searching for Janice should return the following resultset:

Hello Jack # one of the child messages matches the key word
I am Janice # parent message matched the keyword)

My model is as follows:

class Message < ActiveRecord::Base    
  belongs_to :parent, :class_name => "Message"
  has_many   :replies, :class_name => "Message", :foreign_key => :parent_id      
  # content      
  searchable do
    text :content
    integer :parent_id
  end     
end

What is the DSL syntax for specifying nested subquery like conditions?

Edit 1

I considered creating a compound text index field for holding all the indexes. But this approach is not viable in my scenario as I have to ensure that replies match certain additional criteria.

class Message < ActiveRecord::Base    
  belongs_to :parent, :class_name => "Message"
  has_many   :replies, :class_name => "Message", :foreign_key => :parent_id      
  belongs_to :category
  # content      
  searchable do
    text :content
    integer :category_id
    integer :parent_id
  end     
end

In the above model, I want to restrict the text search to a given category.


Solution

  • The best way to accomplish what you are looking for is to denormalize the content of the replies — and any other fields you'd like to make searchable — into their parent Message.

    That's pretty straightforward to do in Sunspot. Another common scenario you might research online would be searching for a blog post based on the contents of its comments.

    One important thing to note here: because of the denormalization, you'll need an after_save hook so that replies can reindex their parent when added or updated.

    In your case, the changes might look something like this…

    class Message < ActiveRecord::Base    
      # …
    
      after_save :reindex_parent
    
      searchable do
        # …
        text :replies_content
      end
    
      def replies_content
        replies.collect(&:content).join(" ")
      end
    
      def reindex_parent
        parent.solr_index!
      end
    
    end
    

    (That text :replies_content could also accept an inline lambda if you want to save a few lines instead of defining a new method. That's up to you.)

    There is no real change in search syntax with this approach, since all the content of the replies will get lumped in to your default keywords search.

    If you have more specific use cases in mind, you'll need to clarify your question, but this seems like the best and simplest approach to me.

    One last note: this approach can be a bit heavy if, for example, your messages have a lot of replies. It's probably a good idea to make sure you're indexing asynchronously using DelayedJob or Resque. But that's a different conversation.

    Update 1: Scoping with a certain category_id

    First of all, I am assuming that each reply may have a category_id distinct from its parent. And, to re-state, you want to perform keyword matching against the parent or reply text content, and you want to scope by category.

    You have a couple options that I see. I'll start with the simplest, and then describe a few likely combinations. The simplest approach would be to do a pretty basic search — don't worry about denormalization or any of that — and reconstruct your parent-child messages with ActiveRecord associations.

    @search = Message.search do
      keywords params[:q]
      with(:category_id, params[:category_id])
    end
    @messages = @search.results
    

    As you can see, scoping by category_id is pretty straightforward in Sunspot. It may be that this is the bulk of your question and I've just gone and made it more complicated than it has to be :)

    From there, some of those @messages will be parents, some will be replies. It is certainly within your view's capability to figure out which is which and render accordingly.

    <% if message.parent %>
      …
    

    There are a few other approaches here depending on the exact nature of your requirements. The above may be good enough, so I won't detail them here. But if you continue to pursue denormalization, you can also include a multi-value integer column for all of a message's replies' category_ids. Something like integer :reply_category_ids, :multi => true.

    This latter approach would provide looser matches against the message thread as a whole, which may or may not be worth the complexity of denormalizing, depending on your app. I'll leave the syntax to you, it mostly flows from my previous examples.

    As you can see, there are a few permutations here, depending on when and where you want to scope against that category. Hopefully my above examples give you enough to go on to figure out the exact specifics for your app.