Search code examples
sqlruby-on-railsactiverecordgreatest-n-per-group

ActiveRecord: How to order and retrieve records in a greatest-n-per-group situation


I'm stuck with a classic greatest-n-per-group problem, where a cat can have many kittens, but I'm usually just interested in the youngest.

I already do know how to build a scope and a has_one relation for the Cat.

My question: Is there a way to...

  1. list all cats' names together with their youngest kittens' names...
  2. while at the same time ordering them by their respective youngest kitten's name...

...using just a single SELECT under the hood?

What I got so far:

class Cat < ApplicationRecord
  has_many :kittens

  has_one :youngest_kitten, -> { merge(Kitten.youngest) }, foreign_key: :cat_id, class_name: :Kitten

  scope :with_youngest_kittens, lambda {
                                joins(:kittens)
                                  .joins(Kitten.younger_kittens_sql("cats.id"))
                                  .where(younger_kittens: { id: nil })
                              }
end

class Kitten
  belongs_to :cat
  scope :youngest, lambda {
    joins(Kitten.younger_kittens_sql("kittens.cat_id"))
      .where(younger_kittens: { id: nil })
  }

  def self.younger_kittens_sql(cat_field_name)
    %{
      LEFT OUTER JOIN kittens AS younger_kittens
      ON younger_kittens.cat_id = #{cat_field_name}
        AND younger_kittens.created_at > kittens.created_at
    }
  end
end

When I run Cat.with_latest_kittens.order('kittens.name').map(&:name) everything looks fine: I get all the cats' names with just a single SELECT.

But when I run Cat.with_latest_kittens.order('kittens.name').map {|cat| cat.youngest_kitten.name}, I get the right result too, but a superfluous additional SELECT per cat is executed. Which is just logical, because the with_youngest_kittens doesn't know it should populate youngest_kitten. Is there a way to tell it or am I going about this all wrong?


Solution

  • I think adding an includes to your :with_youngest_kittens scope will fix the problem. Try changing the scope to

      scope :with_youngest_kittens, lambda {
        includes(:youngest_kitten)
          .joins(:kittens)
          .joins(Kitten.younger_kittens_sql("cats.id"))
          .where(younger_kittens: { id: nil })
      }
    

    This should prevent Rails from making a separate database query for every kitten.