Search code examples
pythonif-statementstring-comparison

Comparing CSV and print out the different row from both the source and the target with Python


Old csv file

Column1 (Column name: I will not compare the column name. The csv is not in order)

AA101
BB101
CC101
DD101
EE101

New csv file

Column2 (Column name: I will not compare the column name. The csv is not in order)

AA101
CC101
BB101
DD102
EE102

Expect result file:

Different:
Old
DD101 (it is not in the New file)
EE101 (it is not in the New file)
New
DD102 (it is not in the Old file)
DD101 (it is not in the Old file)

I reference this post and create the following code Comparing CSV matching rows with Python

import csv

Source_filename = "E:\Path\Source1.csv"
Target_filename = "E:\Path\Target1.csv"
output_filename = "E:results.csv"

# Load all the entries from Source into a set for quick lookup.
source_ids = set()

with open(Source_filename, 'r') as f:
    big_ip = csv.reader(f)
    for csv_row in big_ip:
        source_ids.add(csv_row[0])

# print source_ids

with open(Source_filename, 'r') as input_file, open(output_filename, 'w') as output_file:
    input_csv = csv.reader(input_file)
    output_csv = csv.writer(output_file)
    for csv_row in input_csv:
        ip = csv_row[0]
        status = "Present" if ip in source_ids else "Not Present"
        output_csv.writerow([ip, status + " in Source.csv"])

The code out put is both same and different from the source. I need out put for different only from both the source and the target


Solution

  • One option is to use Pandas. There are many ways to do this, here is one that will give you a complete list of all records with an "Indicator" column set to "both" (if the record appears in both files), "left_only" (if in old file), or "right_only" (if in new file). More on Pandas merge here:

    import pandas as pd
    
    old = pd.read_csv('old_file.csv')
    new = pd.read_csv('new_file.csv')
    output = old.merge(
        new,
        left_on='old_column_name',
        right_on='new_column_name',
        how='outer',
        indicator=True,
    )
    output.to_csv('output.csv')
    

    You can also filter the indicator before saving to csv:

    output[output['_merge'] != 'both'].to_csv('output.csv')