Search code examples
ruby-on-railspostgresqlruby-on-rails-3activerecord

Custom query to fetch all entries of a table and that only contains first of many duplicates based on a specific column


I have a Location model and the table looks like

id name vin ip_address created_at updated_at
0 default 0 0.0.0.0/0 2021-11-08 11:54:26.822623 2021-11-08 11:54:26.822623
1 admin 1 10.108.150.143 2021-11-08 11:54:26.82885 2021-11-08 11:54:26.82885
2 V122 122 10.108.150.122 2021-11-08 11:54:26.82885 2021-11-08 11:54:26.82885
3 V123 123 10.108.150.123 2021-11-08 11:54:26.82885 2021-11-08 11:54:26.82885
4 V124 124 10.108.150.124 2021-11-08 11:54:26.82885 2021-11-08 11:54:26.82885
5 V122 122 10.108.150.122 2021-11-08 11:54:26.82885 2021-11-08 11:54:26.82885
6 V125 122 10.108.150.125 2021-11-08 11:54:26.82885 2021-11-08 11:54:26.82885

My method in the Location model

   def self.find_all_non_duplicate
     return self.find(:all, :conditions => "id <> 1")
   end

I want to fetch all entries of the locations table except the entry with id = 1 and that contains only the first entry of many duplicates based on the column ip_address.

Since ip_address of id = 2 and id = 5 is duplicate. I want to keep the first entry of many duplicates i.e., id = 2.

The expected result is

id name vin ip_address created_at updated_at
0 default 0 0.0.0.0/0 2021-11-08 11:54:26.822623 2021-11-08 11:54:26.822623
2 V122 122 10.108.150.122 2021-11-08 11:54:26.82885 2021-11-08 11:54:26.82885
3 V123 123 10.108.150.123 2021-11-08 11:54:26.82885 2021-11-08 11:54:26.82885
4 V124 124 10.108.150.124 2021-11-08 11:54:26.82885 2021-11-08 11:54:26.82885
6 V125 122 10.108.150.125 2021-11-08 11:54:26.82885 2021-11-08 11:54:26.82885

The entries with id's 1 and 5 to be ignored


Solution

  • What you need is a distinct on proposed to RoR quite recently here but not yet merged, as pointed out by @engineersmnky. In a raw SQL form it would look like this:

    select distinct on (ip_address) * 
    from test 
    where id<>1 
    order by ip_address,created_at;
    

    Which would translate to RoR's

    self.where("id <> 1").distinct_on(:ip_address)
    

    or, until the new feature gets accepted:

    self.where("id <> 1").select("distinct on (ip_address) *")
    

    Full db-side test:

    drop table if exists test cascade;
    create table test (
        id serial primary key,
        name text,
        vin integer,
        ip_address inet,
        created_at timestamp,
        updated_at timestamp);
    insert into test 
    (id,name,vin,ip_address,created_at,updated_at)
    values
    (0,'default', 0,'0.0.0.0/0'::inet,'2021-11-08 11:54:26.822623'::timestamp,'2021-11-08 11:54:26.822623'::timestamp),
    (1,'admin',   1,'10.108.150.143'::inet,'2021-11-08 11:54:26.82885'::timestamp,'2021-11-08 11:54:26.82885'::timestamp),
    (2,'V122',    122,'10.108.150.122'::inet,'2021-11-08 11:54:26.82885'::timestamp,'2021-11-08 11:54:26.82885'::timestamp),
    (3,'V123',    123,'10.108.150.123'::inet,'2021-11-08 11:54:26.82885'::timestamp,'2021-11-08 11:54:26.82885'::timestamp),
    (4,'V124',    124,'10.108.150.124'::inet,'2021-11-08 11:54:26.82885'::timestamp,'2021-11-08 11:54:26.82885'::timestamp),
    (5,'V122',    122,'10.108.150.122'::inet,'2021-11-08 11:54:26.82885'::timestamp,'2021-11-08 11:54:26.82885'::timestamp),
    (6,'V125',    122,'10.108.150.125'::inet,'2021-11-08 11:54:26.82885'::timestamp,'2021-11-08 11:54:26.82885'::timestamp);
    
    select distinct on (ip_address) * 
    from test where id<>1 
    order by ip_address,created_at;
    -- id |  name   | vin |   ip_address   |         created_at         |         updated_at
    ------+---------+-----+----------------+----------------------------+----------------------------
    --  0 | default |   0 | 0.0.0.0/0      | 2021-11-08 11:54:26.822623 | 2021-11-08 11:54:26.822623
    --  2 | V122    | 122 | 10.108.150.122 | 2021-11-08 11:54:26.82885  | 2021-11-08 11:54:26.82885
    --  3 | V123    | 123 | 10.108.150.123 | 2021-11-08 11:54:26.82885  | 2021-11-08 11:54:26.82885
    --  4 | V124    | 124 | 10.108.150.124 | 2021-11-08 11:54:26.82885  | 2021-11-08 11:54:26.82885
    --  6 | V125    | 122 | 10.108.150.125 | 2021-11-08 11:54:26.82885  | 2021-11-08 11:54:26.82885
    --(5 rows)