Search code examples
sqloracle-databasegroupingdatabase-performancerelational-division

SQL - find exact group of records in large table


I have following data:

ID  --- GRP_ID  --- REC_VAL
1   --- 1       --- A
2   --- 2       --- A
3   --- 2       --- B
4   --- 3       --- A
5   --- 3       --- B
6   --- 3       --- C
7   --- 4       --- A
8   --- 4       --- B
9   --- 4       --- C
10  --- 5       --- A
11  --- 5       --- B
12  --- 5       --- E

Is there a way how to find id of record groups that have same values ? (in this case only grp_id 3 and 4 have same values)

Second question:

Is there effecient way how to find exact grp_id when i had a set of values ? My solution is not very quick because table with groups has over 6mil. records:

-- Large table - up to 6m records
create table tmp_grp (id number, grp_id number, rec_val varchar2(10));
--
insert into tmp_grp(id, grp_id, rec_val) values (1, 1, 'A');
insert into tmp_grp(id, grp_id, rec_val) values (2, 2, 'A');
insert into tmp_grp(id, grp_id, rec_val) values (3, 2, 'B');
insert into tmp_grp(id, grp_id, rec_val) values (4, 3, 'A');
insert into tmp_grp(id, grp_id, rec_val) values (5, 3, 'B');
insert into tmp_grp(id, grp_id, rec_val) values (6, 3, 'C');
insert into tmp_grp(id, grp_id, rec_val) values (7, 4, 'A');
insert into tmp_grp(id, grp_id, rec_val) values (8, 4, 'B');
insert into tmp_grp(id, grp_id, rec_val) values (9, 4, 'C');
insert into tmp_grp(id, grp_id, rec_val) values (10, 5, 'A');
insert into tmp_grp(id, grp_id, rec_val) values (11, 5, 'B');
insert into tmp_grp(id, grp_id, rec_val) values (12, 5, 'E');
commit;
--
-- CTE representing record group for asking  
WITH datrec AS
 (SELECT 'A' rec FROM dual UNION ALL 
  SELECT 'B' rec FROM dual)
--
SELECT x.grp_id
  FROM (
  -- Count of joined records 
  SELECT COUNT(1) cnt, t.grp_id
          FROM tmp_grp t
          JOIN datrec d
            ON d.rec = t.rec_val
         GROUP BY t.grp_id
  -- 
  ) x
 WHERE 
 -- Count of all data records
 x.cnt = (SELECT COUNT(1) FROM datrec)
 -- Count of all group records
 AND x.cnt = (SELECT COUNT(1) FROM tmp_grp g WHERE x.grp_id = g.grp_id);
 --

This question is similar to Find group of records that match multiple values , but this topic only cover exact set of values (number of values and values in column rec of datrec will be provided by another query) and query return groups which contains this set. I need to return only exact match.

UPDATE - added data in table for better clarification

Also related to How to compare groups of tuples in sql


Solution

  • Here is a way that avoids joining the base table to itself. It will be more efficient especially if there are several (many?) possible values of rec_val for each grp_id. It can be made faster still if the distinct grp_id already exist somewhere in your data; I create them on the fly.

    with gid ( grp_id ) as (
           select distinct grp_id from tmp_grp
         ),
         prep ( grp_id_1, grp_id_2, rec_val ) as (
           select t.grp_id, g.grp_id, t.rec_val
             from tmp_grp t join gid g on t.grp_id < g.grp_id
           union all
           select g.grp_id, t.grp_id, t.rec_val
             from gid g join tmp_grp t on g.grp_id < t.grp_id
         ),
         counts ( grp_id_1, grp_id_2, cnt ) as (
           select   grp_id_1, grp_id_2, count(*)
           from     prep
           group by grp_id_1, grp_id_2, rec_val
         )
    select   grp_id_1, grp_id_2
    from     counts
    group by grp_id_1, grp_id_2
    having min(cnt) = 2
    ;