I have two csv files named test1.csv
and test2.csv
and they both have a column named 'Name'. I would like to compare each row in this Name
column between both files and output the ones that don't match to a third file. I have seen some examples using pandas, but none worked for my situation. Can anyone help me get a script going for this?
Test2 will be updated to include all values from test1 plus new values not included in test1 (which are the ones i want saved to a third file)
An example of what the columns look like is:
test1.csv:
Name Number Status
gfd454 456 Disposed
3v4fd 521 Disposed
th678iy 678 Disposed
test2.csv
Name Number Status
gfd454 456 Disposed
3v4fd 521 Disposed
th678iy 678 Disposed
vb556h 665 Disposed
See below.
The idea is to read the names into s python set data structure and find the new names by doing set substruction.
1.csv:
Name Number
A 12
B 34
C 45
2.csv
Name Number
A 12
B 34
C 45
D 77
Z 67
The code below will print {'D', 'Z'} which are the new names.
def read_file_to_set(file_name):
with open(file_name) as f:
return set(l.strip().split()[0] for x,l in enumerate(f.readlines()) if x > 0)
set_1 = read_file_to_set('1.csv')
set_2 = read_file_to_set('2.csv')
new_names = set_2 - set_1
print(new_names)