Search code examples
pythonpandascsv

compare 2 csv files pandas by column


I have 2 csv files

one.csv

Supplier Code Cost Price
1 2.25
2 1.25
3 3.25

two.csv

Supplier Code Cost Price
1 2.25
2 1.75
3 3.75
5 1.25

So one.csv is compared to two.csv

ONE.csv has about 500 values in 2 columns. TWO.csv has 10000 values in 2 columns. The shape of the files is same. Code column does not change, but is either present or absent in ONE.csv/TWO.csv

What I am looking to do is:

  1. Find subset of the all Supplier Code in ONE.csv that are present in TWO.csv, with Cost Price updated from TWO.csv

  2. Output that to CSV file

Expected output:

Supplier Code,Cost Price
1,2.25
2,1.75
3,3.75

This is what I have tried...

one = pd.read_csv(ONE_PATH)
two = pd.read_csv(TWO_PATH)

present_in_two = one[one['Supplier Code'].isin(two['Supplier Code'])]

print(present_in_two.head())

I get:

Empty DataFrame
Columns: [Supplier Code, Cost Price]
Index: []

Edit: Corrected typo..


Solution

  • This works for your use case.

    Merge the two dataframes on key Supplier Code and select only the key column from df_one when merging.

    The data form df_two would be selected which also exist on df_one

    dict_one = {"Supplier Code": [1, 2, 3], "Cost Price": [2.25, 1.25, 3.25]}
    dict_two = {"Supplier Code": [1, 2, 3, 5], "Cost Price": [2.25, 1.75, 3.75, 1.25]}
    
    
    df_one = pd.DataFrame(dict_one)
    df_two = pd.DataFrame(dict_two)
    
    df_merged = pd.merge(df_one[["Supplier Code"]], df_two, on="Supplier Code")
    df_merged
    
            Supplier Code   Cost Price
        0   1               2.25
        1   2               1.75
        2   3               3.75