Search code examples
mysqlsqlruby-on-railsactiverecordransack

Group by bucket (with NULL values)


I have the following tables:

  1. entries (id, title, text, duplicate_bucket_id)
  2. duplicate_buckets (id, comment)

So every entry can be in a duplicate bucket. Now I want to get all entries without the duplicates:

SELECT MIN(id) FROM entries GROUP BY duplicate_bucket_id

The problem with this query is that it also groups all the entries without a duplicate_bucket_id to only one entry with NULL.

So I need something like this

(SELECT MIN(id) FROM entries WHERE duplicate_bucket_id IS NOT NULL GROUP BY duplicate_bucket_id)
UNION
(SELECT id FROM entries WHERE duplicate_bucket_id IS NULL)

This query gives me the correct result, but ActiveRecord can't use UNIONs.

Alternatively, I can use this query with a subquery:

SELECT * FROM entries WHERE duplicate_bucket_id IS NULL OR id IN 
 (SELECT MIN(id) FROM entries WHERE duplicate_bucket_id IS NOT NULL GROUP BY duplicate_bucket_id )

In this query, I must place additional where-clauses in AND outside of the subquery. So the query gets quite complicated and I don't know yet, how to use the Ransack Gem with such a query...

The query would be simple, if every "entry" would be in a "duplicate_bucket" - buckets of size 1 (I could use *SELECT * FROM entries GROUP BY duplicate_bucket_id*). But I want to avoid to have entries in a duplicate_bucket, if the entry don't have a duplicate. Is there a simple query (no unions, no subqueries) to get all entries without their duplicates?

Dataset

entries(id, title, text, duplicate_bucket_id)

1, 'My title', 'Bla bla', 1
2, 'Hello', 'Jaha', 1
3, 'Test', 'Bla bla', 1
4, 'Foo', 'Bla', NULL
5, 'Bar1', '', 2
6, 'Bar2', '', 2

duplicate_buckets (id, comment)

1, 'This bucket has 3 entries'
2, 'Bar1 and Bar2 are duplicates!'

Result

1, 'My title', 'Bla bla', 1
4, 'Foo', 'Bla', NULL
5, 'Bar1', '', 2

Solution

  • ANSI/ISO SQL:

    select      * 
    from        entries as e1
    where       not exists (select null from entries as e2 where e2.duplicate_bucket_id = e1.duplicate_bucket_id and e2.id < e1.id)
    ;
    

    MySQL Terrible, Horrible, No Good, Very Bad syntax

    select      * 
    from        entries 
    group by    coalesce(-duplicate_bucket_id,id)
    ;