Search code examples
ruby-on-railssphinxthinking-sphinx

Index virtual attribute Thinking Sphinx


I'm wondering how one might index a virtual attribute on a model with Thinking Sphinx. Given a Project model and some instance method which returns a boolean derived by some other information from another model, say Users, whose attribute is derived and is not on the project table in the database.

For example, suppose we have a method is_user_eligible such that we can query Project.first.is_user_eligible, and get a true or false response. This works in the ORM already.

How can I index this virtual attribute with Thinking Sphinx? I'm able to index virtual attributes in my django project which is on Haystack backed by Elasticsearch. I facilitated this by having a @property decorator on the model method. I figured I should be able to do this with Rails/ThinkingSphinx too, yet I get all sorts of bizarre SQL errors when trying to index. I've tried all sorts of various constructions in setting up my index (e.g. has -vs- indexes) and all result in some sort of SQL error while indexing.

Is this possible with Thinking Sphinx? If so, how can I index a virtual attribute?


Solution

  • You've made it clear that the value is not available as a column on the projects table, but is it on an associated model instead? If so, then you could refer to it via the association:

    has user.is_eligible, :as => :is_user_eligible
    

    However, if it's not a column, but can be determined within the context of the SQL query, then you can use a SQL snippet as the attribute definition (I know my example is rather contrived, but should give you some idea):

    has "(users.foo = 'bar' || users.baz = 'qux')",
      :as   => :is_user_eligible,
      :type => :boolean
    

    If you're referring to associations that aren't used elsewhere in the index definition, you can force the references, or provide a SQL join statement:

    join users
    # or through more than one association:
    join users.addresses
    # or via your own custom join:
    join "INNER JOIN users ON users.project_id = projects.id"
    

    But if you cannot determine this value via SQL at all, then the only way to do this with Thinking Sphinx is use real-time indices instead of SQL-backed indices. What this then means is that instead of referring to associations and columns in your index definitions, you refer to methods instead. So, your attribute would become:

    has is_user_eligible, :type => :boolean
    

    The type must be specified - SQL indices can guess attribute types due to column types, but real-time indices don't have that reference point.

    I realise the link to the real-time indices feature is a blog post I wrote over two years ago. However, the feature certainly works - I and others have been using it in production for quite some time (including with Flying Sphinx).

    On the topic of has vs indexes: if you want to use the value as a filter or for sorting, then it must be an attribute, and thus you should use the has method. However, if it's textual data that you expect search queries to match on, then it should be a field, and thus use the indexes method.

    Certainly I'd recommend switching to real-time indices anyway: it removes the need for deltas and you get up-to-date Sphinx records without needing to run 'ts:index' regularly (or at all - use ts:generate should your data end up in an out-of-date state). But make sure you switch all index definitions to real-time, instead of having some real-time and others SQL-backed.