Search code examples
linuxsortingawkduplicatesunique

Removing duplicate multiply rows from CSV file using a awk


i've used this code to remove duplicates in column #3 from 2 files

awk -F, 'NR==FNR{seen[$3]; next} !($3 in seen)' dublicates.txt need_check.csv > output.csv

But how to check for duplicates if i want check multiply columns (#2,#3 and #4)?


Solution

  • Due to an ambiguity in your question, there are two possible answers.

    If you consider as duplicates records that have the same fields #2, #3 AND #4, you should run:

    awk -F, '{key=$2 FS $3 FS $4} NR==FNR{a[key]; next} !(key in a) dublicates.txt need_check.csv > output.csv
    

    If records are duplicates when only one of the fields is repeated, then you have to code it in another way.

    It is a good practice at SO to include a sample of input and the desired corresponding output.