I have one table named as baseTable containing many columns however I am using 3 columns named as Material_Type, Material_Desc (Have null values), Material_Number to find duplicates using row_num and partition by. Note: I need to filter duplicates based on 3 conditions.
Sample Table :
Material_Type Material_Desc Material_Number
ABC XYZ 1
ABC XYZ 1
ABC XYZ 2
ABC XYZ 3
DEF IMM 1
LMN NULL 1
LMN NULL 2
I only want to have duplicate values in newTable and wanted to remove distinct values.
Desired Output :
Material_Type Material_Desc Material_Number new
ABC XYZ 1 1
ABC XYZ 1 2
ABC XYZ 2 3
ABC XYZ 3 4
LMN NULL 1 1
LMN NULL 2 2
I used below query but not getting expected output as it's not including null values from column Material_Desc and didn't make partition with Null and also creates unwanted duplicate records.
query used :
create table newTable as
with mycte as
(
select
m.MATERIAL_NUMBER
,m.MATERIAL_TYPE
,m.Material_Desc,
row_number() over(partition BY d.MATERIAL_TYPE,d.Material_Desc order by d.MATERIAL_NUMBER) as new
from baseTable m
inner join
(
select MATERIAL_NUMBER,MATERIAL_TYPE,Material_Desc,count(*) from baseTable group by
MATERIAL_NUMBER,MATERIAL_TYPE,Material_Desc having count(*) > 1
) d on d.MATERIAL_NUMBER <> m.MATERIAL_NUMBER and d.MATERIAL_TYPE=m.MATERIAL_TYPE
and d.Material_Desc= m.Material_Desc)
select * from mycte
Any help would be appreciated.
Just use row_number()
and count(*)
as a window function:
select Material_Type, Material_Desc, Material_Number,
row_number() over (partition by Material_Type, Material_Desc order by Material_Number) as new
from (select t.*,
count(*) over (partition by Material_Type, Material_Desc) as cnt
from t
) t
where cnt > 1;
This works for the data you have provided by simply counting the rows for each type and description. If you really do need for the material numbers to be different, one method is min()
and max()
:
select Material_Type, Material_Desc, Material_Number,
row_number() over (partition by Material_Type, Material_Desc order by Material_Number) as new
from (select t.*,
min(Material_Number) over (partition by Material_Type, Material_Desc) as min_Material_Number,
max(Material_Number) over (partition by Material_Type, Material_Desc) as max_Material_Number
from t
) t
where min_Material_Number <> max_Material_Number;