Search code examples
pythonpython-3.xcsvduplicatesexport-to-csv

Find Duplicates from columns in CSV and Remove before write


I'm creating a csv file from reading multiple text file i have created as below

Col1,  Col2,  Col3,  Col4
name1, copy, create, copy
       cut           paste

name2, data, null , data
       cut           cut

i want to remove duplicates from column4 comparing with column2 before writing to csv. like above from row1, column4 should only be paste like wise in row2, column4 should be empty

desired output be like:

Col1,  Col2,  Col3,  Col4
name1, copy, create, paste
       cut           

name2, data, null , 
       cut           

i have something like below

stat2 = 'Col1,Col2,Col3,Col4\n'
text_file=os.listdir('.data/')
for pack in text_file:
    file = open("./data/"+ pack, "r")
    perp = file.read()
stat2 += pack + ',"'

#I'm iterating through different set of list and matching with all multiple files.
for word in package:
    stat2 += word + "\n"
stat2 += '","'

for word in data:
    stat2 += word + "\n"
stat2 += '","'

for word in file:
    stat2 += word + "\n"
stat2 += '"' + "\n"

f = open("data/csv_file.csv", "w")
f.write(stat2)

I want to remove duplicates before writing it to csv. Can anyone please suggest any update on this. Thanks


Solution

  • the question is not very clear. however what you can generally do is compare and edit the elements of one list with another list and remove the duplicates from the target list. suppose in this instance, col2 is the target list:

    col1 = ['copy','create','cut']
    col2 = ['copy','create','cut','delete']
    

    you can use a list comprehension to create a new list that has only the unique values:

    col2 = [i for i in col2 if i not in col1 ]
    

    and then if you print the result, you'll get this for col2:

    ['delete']