Search code examples
oracle-databasestring-comparisoncase-whenutl-match

How can I show the differences beetween 2 columns from different tables using UTL_MATCH in Oracle?


I have two columns In Oracle DB which I want to compare, and if the strings are different, I want to show the differences in other columns it. I know I miss something. So for example:

SELECT A.CD_KEY01,
       A.TEXT_01,
       B.TEXT_02,
       UTL_MATCH.edit_distance_similarity(A.TEXT_01, B.TEXT_02) AS distance_similarity
FROM TB_TABLE_01 A 
JOIN TB_TABLE_02 B
ON A.CD_KEY01 = B.CD_KEY02

Example output I get:

CD_KEY01  |  TEXT_01              |  TEXT_02               | DISTANCE_SIMILARITY 
   111    |  Superman is good     |  Superman is good      | 100
   222    |  Superman is bad      |  Superman is bad       | 100
   333    |  Superman is handsome |  Hulk is ugly          | 33
   444    |  Superman is awful    |  Batman is awful       | 90

Example output I need:

CD_KEY01  |  TEXT_01              |  TEXT_02               | DISTANCE_SIMILARITY | DIFF_01 | DIFF_02 
   111    |  Superman is good     |  Superman is good      | 100                 | NULL    | NULL
   222    |  Superman is bad      |  Superman is bad       | 100                 | NULL    | NULL
   333    |  Superman is handsome |  Hulk is ugly          | 33                  | Hulk    | ugly
   444    |  Superman is awful    |  Batman is awful       | 90                  | Batman  | NULL

Solution

  • I doubt there is an easy way. One method is to divide strings into words somehow (recursive way below), compare word by word and pivot result:

    with 
      a(key, text_a, word, rn) as (
        select cd_key01, text_01, regexp_substr(text_01, '(\w+)', 1, 1), 1 
          from table_01 
        union all
        select key, text_a, regexp_substr(text_a, '(\w+)', 1, rn + 1), rn + 1 
          from a 
          where regexp_substr(text_a, '(\w+)', 1, rn + 1) is not null),
      b(key, text_b, word, rn) as (
        select cd_key02, text_02, regexp_substr(text_02, '(\w+)', 1, 1), 1 
          from table_02 
        union all
        select key, text_b, regexp_substr(text_b, '(\w+)', 1, rn + 1), rn + 1 
          from b where regexp_substr(text_b, '(\w+)', 1, rn + 1) is not null)
    select *
      from (
        select key, rn, text_a, text_b, 
               case when a.word <> b.word then b.word end word 
          from a full join b using (key, rn))
      pivot (max(word) for rn in (1 w1, 2 w2, 3 w3)) order by key
    

    demo

    This query shows comparison for first three words, is is also compared. If strings may have different number of words you have to be carefull and modify case when part handling nulls properly.