Search code examples
pythoncsvcsvkit

Python CSVkit compare CSV files


I have two CSV files that look like this..

CSV 1

reference  |  name  |  house
----------------------------
2348A      |  john  |  37
5648R      |  bill  |  3
RT48       |  kate  |  88
76A        |  harry |  433

CSV2

reference
---------
2348A
76A

Using Python and CSVkit I am trying to create an output CSV of the rows in CSV1 by comparing it to CSV2. Does anybody have an example they can point me in the direction of?


Solution

  • I would recommended to use pandas to achieve what you are looking for:

    And here is how simple it would be using pandas, consider your two csv files are like this:

    CSV1

    reference,name,house
    2348A,john,37
    5648R,bill,3
    RT48,kate,88
    76A,harry ,433
    

    CSV2

    reference
    2348A
    76A
    

    Code

    import pandas as pd
    df1 = pd.read_csv(r'd:\temp\data1.csv')
    df2 = pd.read_csv(r'd:\temp\data2.csv')
    df3 = pd.merge(df1,df2, on= 'reference', how='inner')
    df3.to_csv('outpt.csv')
    

    output.csv

    ,reference,name,house
    0,2348A,john,37
    1,76A,harry ,433