I have two columns In Oracle DB which I want to compare, and if the strings are different, I want to show the differences in other columns it. I know I miss something. So for example:
SELECT A.CD_KEY01,
A.TEXT_01,
B.TEXT_02,
UTL_MATCH.edit_distance_similarity(A.TEXT_01, B.TEXT_02) AS distance_similarity
FROM TB_TABLE_01 A
JOIN TB_TABLE_02 B
ON A.CD_KEY01 = B.CD_KEY02
Example output I get:
CD_KEY01 | TEXT_01 | TEXT_02 | DISTANCE_SIMILARITY
111 | Superman is good | Superman is good | 100
222 | Superman is bad | Superman is bad | 100
333 | Superman is handsome | Hulk is ugly | 33
444 | Superman is awful | Batman is awful | 90
Example output I need:
CD_KEY01 | TEXT_01 | TEXT_02 | DISTANCE_SIMILARITY | DIFF_01 | DIFF_02
111 | Superman is good | Superman is good | 100 | NULL | NULL
222 | Superman is bad | Superman is bad | 100 | NULL | NULL
333 | Superman is handsome | Hulk is ugly | 33 | Hulk | ugly
444 | Superman is awful | Batman is awful | 90 | Batman | NULL
I doubt there is an easy way. One method is to divide strings into words somehow (recursive way below), compare word by word and pivot result:
with
a(key, text_a, word, rn) as (
select cd_key01, text_01, regexp_substr(text_01, '(\w+)', 1, 1), 1
from table_01
union all
select key, text_a, regexp_substr(text_a, '(\w+)', 1, rn + 1), rn + 1
from a
where regexp_substr(text_a, '(\w+)', 1, rn + 1) is not null),
b(key, text_b, word, rn) as (
select cd_key02, text_02, regexp_substr(text_02, '(\w+)', 1, 1), 1
from table_02
union all
select key, text_b, regexp_substr(text_b, '(\w+)', 1, rn + 1), rn + 1
from b where regexp_substr(text_b, '(\w+)', 1, rn + 1) is not null)
select *
from (
select key, rn, text_a, text_b,
case when a.word <> b.word then b.word end word
from a full join b using (key, rn))
pivot (max(word) for rn in (1 w1, 2 w2, 3 w3)) order by key
This query shows comparison for first three words, is
is also compared. If strings may have different number of words you have to be carefull and modify case when
part handling nulls properly.