Search code examples
ruby-on-railsruby-on-rails-3sphinxthinking-sphinx

Sum and Group values by category using Thinking Sphinx


I have this model:

class Job < ActiveRecord::Base
  has_many :groups, class_name: 'JobGroup', inverse_of: :job, dependent: :destroy
  has_many :placements, through: :groups
end

Group model has size attribute and Placement has nameattribute. Then I have...

Job 1 with 3 groups:

  1. with placement_1 and size: 5
  2. with placement_2 and size: 10
  3. with placement_3 and size: 5

And Job 2 with 2 groups:

  1. with placement_1 and size: 5
  2. with placement_2 and size: 15

With thinking-sphinx I want to group by placement_name and sum sizes. Something like this:

[
   "placement_names",
   {
      "placement_name_1": 10,
      "placement_name_2": 25,
      "placement_name_3": 5
   }
]

What do I need to put inside my ThinkingSphinx::Index.define :job, :with => :active_record do #... end definition to make this work?

I know that I can put something like this inside my definition: indexes placements.name, as: :placement_names, facet: true to get the job count by placement name, but I don't know how to sum values grouped by another attribute.

UPDATE:

With the Pat's answer the solution would be:

job_ids = Job.search_for_ids
q = JobGroup.select(['placements.name', 'placements.id as placement_id', 'SUM(job_groups.size) as size'])
q = q.joins(:placement)
q = q.where(job_id: job_ids)
q = q.group('placement_id')

Solution

  • What you're asking for isn't really possible with Sphinx, for two reasons:

    • Sphinx returns documents, not aggregations.
    • Sphinx also has no concept of hashes/dictionaries/key-value pairs. If you're indexing your Job model, you can have an array of placement sizes per document (job record), but you can't associate these with specific placement names within the context of a job.

    An alternative could be to perform a search to get the job ids back matching whatever the query is, and then run the aggregation in your database instead for just those jobs? Thinking Sphinx has the search_for_ids method which takes the same arguments as search, but returns an array of ids instead of ActiveRecord objects (which can be useful from a performance perspective).