Old csv file
Column1 (Column name: I will not compare the column name. The csv is not in order)
AA101
BB101
CC101
DD101
EE101
New csv file
Column2 (Column name: I will not compare the column name. The csv is not in order)
AA101
CC101
BB101
DD102
EE102
Expect result file:
Different:
Old
DD101 (it is not in the New file)
EE101 (it is not in the New file)
New
DD102 (it is not in the Old file)
DD101 (it is not in the Old file)
I reference this post and create the following code Comparing CSV matching rows with Python
import csv
Source_filename = "E:\Path\Source1.csv"
Target_filename = "E:\Path\Target1.csv"
output_filename = "E:results.csv"
# Load all the entries from Source into a set for quick lookup.
source_ids = set()
with open(Source_filename, 'r') as f:
big_ip = csv.reader(f)
for csv_row in big_ip:
source_ids.add(csv_row[0])
# print source_ids
with open(Source_filename, 'r') as input_file, open(output_filename, 'w') as output_file:
input_csv = csv.reader(input_file)
output_csv = csv.writer(output_file)
for csv_row in input_csv:
ip = csv_row[0]
status = "Present" if ip in source_ids else "Not Present"
output_csv.writerow([ip, status + " in Source.csv"])
The code out put is both same and different from the source. I need out put for different only from both the source and the target
One option is to use Pandas. There are many ways to do this, here is one that will give you a complete list of all records with an "Indicator" column set to "both" (if the record appears in both files), "left_only" (if in old file), or "right_only" (if in new file). More on Pandas merge here:
import pandas as pd
old = pd.read_csv('old_file.csv')
new = pd.read_csv('new_file.csv')
output = old.merge(
new,
left_on='old_column_name',
right_on='new_column_name',
how='outer',
indicator=True,
)
output.to_csv('output.csv')
You can also filter the indicator before saving to csv:
output[output['_merge'] != 'both'].to_csv('output.csv')