I have to compare 2 different sources and identify all the mismatches for all IDs
Source_excel
table
+-----+-------------+------+----------+
| id | name | City | flag |
+-----+-------------+------+----------+
| 101 | Plate | NY | Ready |
| 102 | Back washer | NY | Sold |
| 103 | Ring | MC | Planning |
| 104 | Glass | NMC | Ready |
| 107 | Cover | PR | Ready |
+-----+-------------+------+----------+
Source_dw
table
+-----+----------+------+----------+
| id | name | City | flag |
+-----+----------+------+----------+
| 101 | Plate | NY | Planning |
| 102 | Nut | TN | Expired |
| 103 | Ring | MC | Planning |
| 104 | Top Wire | NY | Ready |
| 105 | Bolt | MC | Expired |
+-----+----------+------+----------+
Expected result
+-----+-------------+----------+------------+----------+------------+---------+------------------+
| ID | excel_name | dw_name | excel_flag | dw_flag | excel_city | dw_city | RESULT |
+-----+-------------+----------+------------+----------+------------+---------+------------------+
| 101 | Plate | Plate | Ready | Planning | NY | NY | FLAG_MISMATCH |
| 102 | Back washer | Nut | Sold | Expired | NY | TN | NAME_MISMATCH |
| 102 | Back washer | Nut | Sold | Expired | NY | TN | FLAG_MISMATCH |
| 102 | Back washer | Nut | Sold | Expired | NY | TN | CITY_MISMATCH |
| 103 | Ring | Ring | Planning | Planning | MC | MC | ALL_MATCH |
| 104 | Glass | Top Wire | Ready | Ready | NMC | NY | NAME_MISMATCH |
| 104 | Glass | Top Wire | Ready | Ready | NMC | NY | CITY_MISMATCH |
| 107 | Cover | | Ready | | PR | | MISSING IN DW |
| 105 | | Bolt | | Expired | | MC | MISSING IN EXCEL |
+-----+-------------+----------+------------+----------+------------+---------+------------------+
I have tried the below query but it is giving only one mismatch.
select ISNULL(EXCEL.ID,DW.ID) ID,
excel.name as excel_name,dw.name as dw_name,
excel.flag as excel_flag,dw.flag as dw_flag,
excel.city as excel_city,dw.city as dw_city,
RESULT = CASE WHEN excel.ID IS NULL THEN 'MISSING IN EXCEL'
WHEN dw.ID IS NULL THEN 'MISSING IN DW'
WHEN excel.NAME<>dw.NAME THEN 'NAME_MISMATCH'
WHEN excel.CITY<>dw.CITY THEN 'CITY_MISMATCH'
WHEN excel.FLAG <> dw.FLAG THEN 'FLAG_MISMATCH'
ELSE 'ALL_MATCH' END
from source_excel excel
FULL OUTER JOIN source_dw dw ON excel.id=dw.id
Actual output
+-----+-------------+----------+------------+----------+------------+---------+------------------+
| ID | excel_name | dw_name | excel_flag | dw_flag | excel_city | dw_city | RESULT |
+-----+-------------+----------+------------+----------+------------+---------+------------------+
| 101 | Plate | Plate | Ready | Planning | NY | NY | FLAG_MISMATCH |
| 102 | Back washer | Nut | Sold | Expired | NY | TN | NAME_MISMATCH |
| 103 | Ring | Ring | Planning | Planning | MC | MC | ALL_MATCH |
| 104 | Glass | Top Wire | Ready | Ready | NMC | NY | NAME_MISMATCH |
| 107 | Cover | | Ready | | PR | | MISSING IN DW |
| 105 | | Bolt | | Expired | | MC | MISSING IN EXCEL |
+-----+-------------+----------+------------+----------+------------+---------+------------------+
I understand that case
expression will only check the first satisfying condition. Is there any other way I can check all the condition?
If I follow you correctly, you want one row per mismatch, or one row indicating that everything matches.
You can use cross apply
to generate the rows, like so:
SELECT
COALESCE(xl.ID, dw.ID) ID,
xl.name as excel_name,dw.name as dw_name,
xl.flag as excel_flag,dw.flag as dw_flag,
xl.city as excel_city,dw.city as dw_city,
x.result
FROM source_excel xl
FULL OUTER JOIN source_dw dw ON xl.id = dw.id
CROSS APPLY (VALUES
(CASE WHEN xl.ID IS NULL THEN 'MISSING IN EXCEL' END),
(CASE WHEN dw.ID IS NULL THEN 'MISSING IN DW' END),
(CASE WHEN WHEN xl.NAME <> dw.NAME THEN 'NAME_MISMATCH' END),
(CASE WHEN xl.CITY <> dw.CITY THEN 'CITY_MISMATCH' END),
(CASE WHEN xl.FLAG <> dw.FLAG THEN 'FLAG_MISMATCH' END),
(CASE WHEN
xl.ID = dw.ID
AND xl.NAME = dw.NAME
AND xl.CITY = dw.CITY
AND xl.FLAG = dw.FLAG
THEN 'ALL_MATCH' END)
) x(result)
WHERE x.result IS NOT NULL