I have two csv files and both files have records. I want to delete duplicate records. I want to get unique records. How can I do it with Apache Nifi?
Thank you !
input1.csv ;
id,surname,name
1,ali,veli
2,mert,tolga
input2.csv ;
id,surname,name
1,ali,veli
3,ahmet,ozan
output.csv ;
id,surname,name
1,ali,veli
2,mert,ayşe
3,ahmet,ozan
You can do this by doing Record based processing and combine the MergeRecord to merge the two csv files into one and then you can use QueryRecord processor for deduplication with query like:
SELECT * FROM FLOWFILE
INTERSECT
SELECT * FROM FLOWFILE
SELECT DISTINCT FROM FLOWFILE will not work. Here are Calcite docs https://calcite.apache.org/docs/reference.html
So you would need:
on the output on the QueryRecord you will get deduplicated CSV file.
The output: