Search code examples
sqlruby-on-rails-4subqueryselect-n-plus-1

Rails 4, ActiveRecord SQL subqueries


A :parent has_many :children and I am trying to retrieve the age of the oldest child for a parent as an attribute on the parent. I'm open to any solution that accomplishes that efficiently.

The reason I'm trying to do a subquery is to let the DB do the n+1 overhead instead of making a separate DB request for each parent. Both are inefficient, but using a subquery seems more efficient.

# attributes: id
class Parent < ActiveRecord::Base
  has_many :children

  # Results in an (n+1) request
  def age_of_oldest_child
    children.maximum(:age)
  end
end

# attributes: id, parent_id, age
class Child < ActiveRecord::Base
  belongs_to :parent
end

Sample use case:

parent = Parent.first.age_of_oldest_child # => 16

parents = Parent.all
parents.each do |parent|
  puts parent.age_of_oldest_child # => 16, ...
end

My attempt:

sql = "
  SELECT 
    (SELECT
      MAX(children.age)
      FROM children
      WHERE children.parent_id = parents.id
    ) AS age_of_oldest_child
  FROM
    parents;
"

Parent.find_by_sql(sql)

This returns an array of maximum ages for all parents; I would like to restrict this to just 1 parent or also have it included as an attribute on a parent when I retrieve all parents.

Update 2015-06-19 11:00

Here is a workable solution I came up with; are there more efficient alternatives?

class Parent < ActiveRecord::Base
  scope :with_oldest_child, -> { includes(:oldest_child) }

  has_many :children
  has_one :oldest_child, -> { order(age: :desc).select(:age, :parent_id) }, class_name: Child

  def age_of_oldest_child
    oldest_child && oldest_child.age
  end
end

Example usage:

# 2 DB queries, 1 for parent and 1 for oldest_child
parent = Parent.with_oldest_child.find(1)

# No further DB queries
parent.age_of_oldest_child # => 16

Solution

  • Here are two ways of doing it:

    parent.rb

    class Parent < ActiveRecord::Base
      has_many :children
    
      # Leaves choice of hitting DB up to Rails
      def age_of_oldest_child_1
        children.max_by(&:age)
      end
    
      # Always hits DB, but avoids instantiating Child objects
      def age_of_oldest_child_2
        Child.where(parent: self).maximum(:age)
      end
    end
    

    The first method uses the enumerable module's max_by functionality and calls age on each object in the collection. The advantage of doing it this way is you leave the logic of whether or not to hit the database to Rails. If the children are already instantiated for some reason, it won't hit the database again. If they are not instantiated, it will perform a select query, load them into memory in a single query (thus avoiding N+1) and then go through each one calling its age method.

    The two disadvantages, however, are that if the underlying data has changed since the children were instantiated, it will still use the outdated result (this could be avoided by passing :true when calling :children. Also, it is loading every single child into memory first, then counting them. If the child object is large and/or a parent has a large number of children, that could be memory-intensive. It really depends on your use case.

    If you decided you wanted to avoid loading all of those children, you could do a straight DB hit every time using the count query depicted in method 2. In fact, you would probably actually want to relocate that to a scope in Child as perhaps some would consider it an anti-pattern to do queries like that outside of the target model, but this just makes it easier to see for the example.