Search code examples
sqlruby-on-railsrubythinking-sphinx

Thinking sphinx - complex query with intersect and subqueries


I have such models: 

class Machine < ActiveRecord::Base
  has_many :property_values, dependent: :destroy
end

class PropertyValue < ActiveRecord::Base
  belongs_to :property, touch: true
  belongs_to :machine, touch: true
end

Schema of PropertyValue model:

  create_table "property_values", force: true do |t|
    t.integer  "property_id"
    t.integer  "machine_id"
    t.datetime "created_at",  null: false
    t.datetime "updated_at",  null: false
    t.float    "value_int"
    t.string   "value_str"
  end

In my Machine model I have such method:

class Machine < ActiveRecord::Base
  def filter_search(params, order_query = 'machines.updated_at desc')
   if params[:properties].present?
    query = []
    params[:properties].each_pair do |p|
      property_id = p.first.split('_').last
      p.last.each do |type, value|
        if value.present?
          select_join_query = "(SELECT machine_id FROM property_values AS pv LEFT OUTER JOIN machines ON machines.id = pv.machine_id
                    WHERE pv.property_id = #{property_id}"
          if type == 'str'
            query << select_join_query + " AND pv.value_str ILIKE '%#{value.mb_chars}%')"
          elsif type == 'gt' || type == 'lt'
            query << select_join_query + " AND pv.value_int #{type == 'gt' ? '>=' : '<='} #{value})"
          elsif type == 'float'
            floated_val = value.to_f                                                            # because input value may be integer.
            query << select_join_query + " AND pv.value_int = #{floated_val})"
          end
        end
      end
    end

    machines = machines.where("machines.id IN (#{query.join('intersect')})").references(:property_value) if query.present?
   end
  end
end

Idea of this method is to perform separate queries for different pairs of properties and and find machines, which are satisfy these conditions via intersection. So, if I have such input properties:

{"properties"=>{"property_7"=>{"str"=>"some data"}, "property_6"=>{"gt"=>"1", "lt"=>"1000"}, "property_5"=>{"gt"=>"3", "lt"=>"800"}}}

Then it generates a query, which looks something like this:

(machines.id IN ((SELECT machine_id FROM property_values AS pv LEFT OUTER JOIN machines ON machines.id = pv.machine_id
                        WHERE pv.property_id = 7 AND pv.value_str ILIKE '%some data%')intersect(SELECT machine_id FROM property_values AS pv LEFT OUTER JOIN machines ON machines.id = pv.machine_id
                        WHERE pv.property_id = 6 AND pv.value_int >= 1)intersect(SELECT machine_id FROM property_values AS pv LEFT OUTER JOIN machines ON machines.id = pv.machine_id
                        WHERE pv.property_id = 6 AND pv.value_int <= 1000)intersect(SELECT machine_id FROM property_values AS pv LEFT OUTER JOIN machines ON machines.id = pv.machine_id
                        WHERE pv.property_id = 5 AND pv.value_int >= 3)intersect(SELECT machine_id FROM property_values AS pv LEFT OUTER JOIN machines ON machines.id = pv.machine_id
                        WHERE pv.property_id = 5 AND pv.value_int <= 800)))

Is it possible somehow to write the analogue via Thinking sphinx, which would perform such complex query? I have no idea how to do that.


Solution

  • You'd need to split this into two parts in Sphinx as well. The first would be to search on PropertyValue (as in a Sphinx index for Machine, a value's property ids and values cannot be linked together - there's no data type of a hash in Sphinx).

    And within that search upon PropertyValue, you'd need to compose a custom SELECT clause...

    value_ids = PropertyValue.search_for_ids(
      :select => "*, ((property_id = 6 AND value_int > 1) OR (...)) AS matching",
      :with => {:matching => true}
    )
    

    And then you can use those value_ids to query for the machines you want. If you need all of the values to be present for a specific machine, then you can use the :with_all option:

    Machine.search :with_all => {:property_value_ids => value_ids}
    

    This ensures machines that match one (or more) but not all of the ids are not return.

    This may require some serious modification, as I don't quite grok the intersect stuff you're doing, but hopefully it's enough to put you on the right path.