Search code examples
ruby-on-railspostgresqlactiverecord

Ordering by associated value with from outer left join yields duplicates


I am getting duplicate and sometimes unordered results with doing a LEFT OUTER JOIN.

I think the issue is that there are many artwork_colors that can match the WHERE condition. But I just want the artwork's artwork_color with the highest pixel_percent (that also matches the WHERE clause), and to use that value for ordering the artworks. And of course for each artwork to appear only once in the results.

How do I rewrite the ActiveRecord query to order the results correctly while avoiding duplicates? (This query must be made with ActiveRecord so that I can plug it into an existing pagination system, etc.)

The query, simplified for this question:

# color_ids is an group of ids of colors that are similar to the colors being searched for
# for example if you search for FF0000 it will include similar shades of red

SELECT DISTINCT
    "artwork_colors"."pixel_percent",
    artworks.*
FROM
    "artworks"
    LEFT OUTER JOIN "artwork_colors" ON "artwork_colors"."artwork_id" = "artworks"."id"
WHERE
    "artwork_colors"."color_id" IN(106, 108, 119, 120, 128, 133, 156, 160)
ORDER BY
    "artwork_colors"."pixel_percent" DESC
LIMIT 120 OFFSET 0;

The original query in ActiveRecord:

artworks
  .includes(:artwork_colors)
  .where('artwork_colors.color_id': color_ids)
  .order(pixel_percent: :desc)
  .select('artworks.*', 'artwork_colors.pixel_percent')
  .distinct

Relevant models and tables:

class Artwork < ApplicationRecord
  has_many :artwork_colors, dependent: :destroy
  has_many :colors, through: :artwork_colors
end

class ArtworkColor < ApplicationRecord
  belongs_to :artwork
  belongs_to :color
end

CREATE TABLE public.artwork_colors (
  id bigint NOT NULL,
  pixel_percent double precision, # this is the desired sort column
  artwork_id bigint,
  color_id bigint
);

class Color < ApplicationRecord
  # These colors were extracted from the Artwork images by Amazon Rekognition, an image analysis tool
  has_many :artwork_colors, dependent: :destroy
  has_many :artworks, through: :artwork_colors
end

# h s l are hue, saturation, lightness (the color value)
CREATE TABLE public.colors (
  id bigint NOT NULL,
  h double precision,
  s double precision,
  l double precision
);

Solution

  • I used a little Ruby to kind of cheat and get the order I needed, and then returned an ActiveRecord collection that just loads an ordered list of the desired artwork ids. This might not work in all use cases, just be warned.

        color_ids = Color.where(
          h: hue_range(:h, 30),
          s: color_range(:s, similarity * 0.5),
          l: color_range(:l, similarity)
        ).pluck(:id)
    
        artwork_ids = artworks
                      .includes(:artwork_colors)
                      .where('artwork_colors.color_id': color_ids)
                      .order(pixel_percent: :desc)
                      .limit(1500)
                      .pluck(:id)
                      .uniq
    
        # preserve the array order https://stackoverflow.com/a/9475755/8749164
        ordered_ids = artwork_ids
                      .map { |id|  "artworks.id=#{id} DESC" }
                      .join(', ')
    
        Artwork
          .where(id: artwork_ids)
          .order(Arel.sql(ordered_ids))