Search code examples
oracledatasetcomparisonunion

Comparing 2 sets of data in the same table - oracle


Lets say i have a table with the following data.

+-------------------------------+
|   UniqueID  Name    Dataset   |
+-------------------------------+
| 1         ABC1    A:B;C:D;E:F |
| 2         ABC2    A:B;C:D;R:S |
| 3         ABC3    C:4;G:5;A:B |
| 4         ABC4    A:B;C:D;E:F |
+-------------------------------+

where dataset is combinations of data e.g. A:B, delimited by ;

Effectively what i want to do do is compare each set of data against the other record and build a picture up like the following by comparing 'Dataset' to get result like this (this is just the first comparison)

+--------------------------------------------------------------------------+
| UniqueID  Name    UniqueID   Name    Matched on  OnlyinBase OnlyinTarget |
+--------------------------------------------------------------------------+
| 1         ABC1    2          ABC2    A:B;C:D       E:F           R:S     |
| etc                                                                      |
+--------------------------------------------------------------------------+

What is the best way to do the above?


Solution

  • I think your sample data set isn't correct.

    • Shouldn't for ID = 2 D be separated from R by semi-colon (not comma)?
    • Shouldn't for ID = 3 5 be separated from A by semi-colon (not colon)?

    I fixed that (if it should be fixed) and wrote the following PL/SQL code; I don't know whether you can do that in pure SQL. Have a look, see if it helps.

    What does it do? Using nested loops, splits all DATASETs into rows and - using SET operators (INTERSECT, MINUS) decides which group the result belongs to (matched / only in base / only in target).

    SQL> select * from test;
    
            ID NAME DATASET
    ---------- ---- --------------------
             1 ABC1 A:B;C:D;E:F
             2 ABC2 A:B;C:D;R:S
             3 ABC3 C:4;G:5;A:B
             4 ABC4 A:B;C:D;E:F
    
    SQL> set serveroutput on
    SQL>
    SQL> DECLARE
      2     l_matched   VARCHAR2 (20);
      3     l_base      VARCHAR2 (20);
      4     l_target    VARCHAR2 (20);
      5  BEGIN
      6     FOR cur_1 IN (  SELECT id, name, dataset
      7                       FROM test
      8                   ORDER BY id)
      9     LOOP
     10        FOR cur_2 IN (  SELECT id, name, dataset
     11                          FROM test
     12                         WHERE id > cur_1.id
     13                      ORDER BY id)
     14        LOOP
     15           -- Matched
     16           SELECT LISTAGG (col, ';') WITHIN GROUP (ORDER BY col)
     17             INTO l_matched
     18             FROM (    SELECT REGEXP_SUBSTR (cur_1.dataset,
     19                                             '[^;]+',
     20                                             1,
     21                                             LEVEL)
     22                                 col
     23                         FROM DUAL
     24                   CONNECT BY LEVEL <= REGEXP_COUNT (cur_1.dataset, ';') + 1
     25                   INTERSECT
     26                       SELECT REGEXP_SUBSTR (cur_2.dataset,
     27                                             '[^;]+',
     28                                             1,
     29                                             LEVEL)
     30                                 col
     31                         FROM DUAL
     32                   CONNECT BY LEVEL <= REGEXP_COUNT (cur_2.dataset, ';') + 1);
     33
     34           -- Only in base
     35           SELECT LISTAGG (col, ';') WITHIN GROUP (ORDER BY col)
     36             INTO l_base
     37             FROM (    SELECT REGEXP_SUBSTR (cur_1.dataset,
     38                                             '[^;]+',
     39                                             1,
     40                                             LEVEL)
     41                                 col
     42                         FROM DUAL
     43                   CONNECT BY LEVEL <= REGEXP_COUNT (cur_1.dataset, ';') + 1
     44                   MINUS
     45                       SELECT REGEXP_SUBSTR (cur_2.dataset,
     46                                             '[^;]+',
     47                                             1,
     48                                             LEVEL)
     49                                 col
     50                         FROM DUAL
     51                   CONNECT BY LEVEL <= REGEXP_COUNT (cur_2.dataset, ';') + 1);
     52
     53           -- Only in target
     54           SELECT LISTAGG (col, ';') WITHIN GROUP (ORDER BY col)
     55             INTO l_target
     56             FROM (    SELECT REGEXP_SUBSTR (cur_2.dataset,
     57                                             '[^;]+',
     58                                             1,
     59                                             LEVEL)
     60                                 col
     61                         FROM DUAL
     62                   CONNECT BY LEVEL <= REGEXP_COUNT (cur_2.dataset, ';') + 1
     63                   MINUS
     64                       SELECT REGEXP_SUBSTR (cur_1.dataset,
     65                                             '[^;]+',
     66                                             1,
     67                                             LEVEL)
     68                                 col
     69                         FROM DUAL
     70                   CONNECT BY LEVEL <= REGEXP_COUNT (cur_1.dataset, ';') + 1);
     71
     72           DBMS_OUTPUT.put_line (
     73                 cur_1.id
     74              || ' '
     75              || cur_1.name
     76              || ' '
     77              || cur_2.id
     78              || ' '
     79              || cur_2.name
     80              || ' '
     81              || rpad(l_matched, 20, ' ')
     82              || ' '
     83              || rpad(l_base, 20, ' ')
     84              || ' '
     85              || rpad(l_target, 20, ' '));
     86        END LOOP;
     87     END LOOP;
     88  END;
     89  /
    1 ABC1 2 ABC2 A:B;C:D              E:F                  R:S
    1 ABC1 3 ABC3 A:B                  C:D;E:F              C:4;G:5
    1 ABC1 4 ABC4 A:B;C:D;E:F
    2 ABC2 3 ABC3 A:B                  C:D;R:S              C:4;G:5
    2 ABC2 4 ABC4 A:B;C:D              R:S                  E:F
    3 ABC3 4 ABC4 A:B                  C:4;G:5              C:D;E:F
    
    PL/SQL procedure successfully completed.
    
    SQL>