Search code examples
sqlgroup-bydistinct

Retrieve records with unique or largest string in that column which contains other strings in that column and matching count


I have the following table:

this_table (phrase is always unique)

phrase count
elemo 11
e 37
elemo a 11
r sundt 6
wit 102
lolo m 102
o 1475
mo 111
lo m 186
domo 111

I'd like a query that removes results if their (record A) phrase is contained within another record's (record B) phrase AND their (record A) count matches the containing record's (record B) count.

Results

phrase count
e 37
elemo a 11
r sundt 6
wit 102
lolo m 102
o 1475
lo m 186
domo 111

Solution

  • You didn't specify the database, but this should do it. The idea is about the same as suggested by Barmar in the comments. I find it easier to implement the logic using not exists

    with cte (phrase, counts) as
      
    (select 'elemo',    11  union all
     select 'e',        37  union all
     select 'elemo a',  11  union all
     select 'r sundt',  6   union all
     select 'wit',     102  union all
     select 'lolo m',  102  union all
     select 'o',       1475 union all
     select 'mo',      111  union all
     select 'lo m',    186  union all
     select 'domo',    111)
    
    select *
    from cte a
    where not exists (select *
                      from cte b
                      where a.counts=b.counts and 
                            a.phrase <> b.phrase and 
                            a.phrase like concat('%',b.phrase,'%'))