Search code examples
pythonpython-3.xpandascsvexport-to-csv

how to compare two csv file in python and flag the difference?


i am new to python. Kindly help me. Here I have two set of csv-files. i need to compare and output the difference like changed data/deleted data/added data. here's my example

file 1:
Sn  Name  Subject   Marks  
1   Ram      Maths     85
2   sita    Engilsh    66
3   vishnu  science    50
4   balaji  social     60

file 2:
Sn  Name    Subject   Marks
1   Ram     computer  85   #subject name have changed
2   sita    Engilsh   66
3   vishnu  science   90   #marks have changed
4   balaji  social    60
5   kishor  chem      99   #added new line

Output - i need to get like this :

Changed Items: 
1   Ram      computer  85
3   vishnu    science  90
Added item:
5   kishor    chem   99
Deleted item:
.................

I imported csv and done the comparasion via for loop with redlines. I am not getting the desire output. its confusing me a lot when flagging the added & deleted items between file 1 & file2 (csv files). pl suggest the effective code folks.


Solution

  • The idea here is to flatten your dataframe with melt to compare each value:

    # Load your csv files
    df1 = pd.read_csv('file1.csv', ...)
    df2 = pd.read_csv('file2.csv', ...)
    
    # Select columns (not mandatory, it depends on your 'Sn' column)
    cols = ['Name', 'Subject', 'Marks']
    
    # Flat your dataframes
    out1 = df1[cols].melt('Name', var_name='Item', value_name='Old')
    out2 = df2[cols].melt('Name', var_name='Item', value_name='New')
    out = pd.merge(out1, out2, on=['Name', 'Item'], how='outer')
    
    # Flag the state of each item
    condlist = [out['Old'] != out['New'],
                out['Old'].isna(),
                out['New'].isna()]
    
    out['State'] = np.select(condlist, choicelist=['changed', 'added', 'deleted'], 
                             default='unchanged')
    

    Output:

    >>> out
         Name     Item      Old       New      State
    0     Ram  Subject    Maths  computer    changed
    1    sita  Subject  Engilsh   Engilsh  unchanged
    2  vishnu  Subject  science   science  unchanged
    3  balaji  Subject   social    social  unchanged
    4     Ram    Marks       85        85  unchanged
    5    sita    Marks       66        66  unchanged
    6  vishnu    Marks       50        90    changed
    7  balaji    Marks       60        60  unchanged
    8  kishor  Subject      NaN      chem    changed
    9  kishor    Marks      NaN        99    changed