Lets say i have a table with the following data.
+-------------------------------+
| UniqueID Name Dataset |
+-------------------------------+
| 1 ABC1 A:B;C:D;E:F |
| 2 ABC2 A:B;C:D;R:S |
| 3 ABC3 C:4;G:5;A:B |
| 4 ABC4 A:B;C:D;E:F |
+-------------------------------+
where dataset is combinations of data e.g. A:B, delimited by ;
Effectively what i want to do do is compare each set of data against the other record and build a picture up like the following by comparing 'Dataset' to get result like this (this is just the first comparison)
+--------------------------------------------------------------------------+
| UniqueID Name UniqueID Name Matched on OnlyinBase OnlyinTarget |
+--------------------------------------------------------------------------+
| 1 ABC1 2 ABC2 A:B;C:D E:F R:S |
| etc |
+--------------------------------------------------------------------------+
What is the best way to do the above?
I think your sample data set isn't correct.
ID = 2
D
be separated from R
by semi-colon (not comma)? ID = 3
5
be separated from A
by semi-colon (not colon)?I fixed that (if it should be fixed) and wrote the following PL/SQL code; I don't know whether you can do that in pure SQL. Have a look, see if it helps.
What does it do? Using nested loops, splits all DATASET
s into rows and - using SET
operators (INTERSECT
, MINUS
) decides which group the result belongs to (matched / only in base / only in target).
SQL> select * from test;
ID NAME DATASET
---------- ---- --------------------
1 ABC1 A:B;C:D;E:F
2 ABC2 A:B;C:D;R:S
3 ABC3 C:4;G:5;A:B
4 ABC4 A:B;C:D;E:F
SQL> set serveroutput on
SQL>
SQL> DECLARE
2 l_matched VARCHAR2 (20);
3 l_base VARCHAR2 (20);
4 l_target VARCHAR2 (20);
5 BEGIN
6 FOR cur_1 IN ( SELECT id, name, dataset
7 FROM test
8 ORDER BY id)
9 LOOP
10 FOR cur_2 IN ( SELECT id, name, dataset
11 FROM test
12 WHERE id > cur_1.id
13 ORDER BY id)
14 LOOP
15 -- Matched
16 SELECT LISTAGG (col, ';') WITHIN GROUP (ORDER BY col)
17 INTO l_matched
18 FROM ( SELECT REGEXP_SUBSTR (cur_1.dataset,
19 '[^;]+',
20 1,
21 LEVEL)
22 col
23 FROM DUAL
24 CONNECT BY LEVEL <= REGEXP_COUNT (cur_1.dataset, ';') + 1
25 INTERSECT
26 SELECT REGEXP_SUBSTR (cur_2.dataset,
27 '[^;]+',
28 1,
29 LEVEL)
30 col
31 FROM DUAL
32 CONNECT BY LEVEL <= REGEXP_COUNT (cur_2.dataset, ';') + 1);
33
34 -- Only in base
35 SELECT LISTAGG (col, ';') WITHIN GROUP (ORDER BY col)
36 INTO l_base
37 FROM ( SELECT REGEXP_SUBSTR (cur_1.dataset,
38 '[^;]+',
39 1,
40 LEVEL)
41 col
42 FROM DUAL
43 CONNECT BY LEVEL <= REGEXP_COUNT (cur_1.dataset, ';') + 1
44 MINUS
45 SELECT REGEXP_SUBSTR (cur_2.dataset,
46 '[^;]+',
47 1,
48 LEVEL)
49 col
50 FROM DUAL
51 CONNECT BY LEVEL <= REGEXP_COUNT (cur_2.dataset, ';') + 1);
52
53 -- Only in target
54 SELECT LISTAGG (col, ';') WITHIN GROUP (ORDER BY col)
55 INTO l_target
56 FROM ( SELECT REGEXP_SUBSTR (cur_2.dataset,
57 '[^;]+',
58 1,
59 LEVEL)
60 col
61 FROM DUAL
62 CONNECT BY LEVEL <= REGEXP_COUNT (cur_2.dataset, ';') + 1
63 MINUS
64 SELECT REGEXP_SUBSTR (cur_1.dataset,
65 '[^;]+',
66 1,
67 LEVEL)
68 col
69 FROM DUAL
70 CONNECT BY LEVEL <= REGEXP_COUNT (cur_1.dataset, ';') + 1);
71
72 DBMS_OUTPUT.put_line (
73 cur_1.id
74 || ' '
75 || cur_1.name
76 || ' '
77 || cur_2.id
78 || ' '
79 || cur_2.name
80 || ' '
81 || rpad(l_matched, 20, ' ')
82 || ' '
83 || rpad(l_base, 20, ' ')
84 || ' '
85 || rpad(l_target, 20, ' '));
86 END LOOP;
87 END LOOP;
88 END;
89 /
1 ABC1 2 ABC2 A:B;C:D E:F R:S
1 ABC1 3 ABC3 A:B C:D;E:F C:4;G:5
1 ABC1 4 ABC4 A:B;C:D;E:F
2 ABC2 3 ABC3 A:B C:D;R:S C:4;G:5
2 ABC2 4 ABC4 A:B;C:D R:S E:F
3 ABC3 4 ABC4 A:B C:4;G:5 C:D;E:F
PL/SQL procedure successfully completed.
SQL>