Search code examples
pandascomparisonmultiple-columnspython-3.9

Python Pandas Match 2 columns from 2 files to fill values in a file


Please help filling values in "file1.csv" (daily data) from "file2.csv" (weekly data).

Below is "file1.csv"(daily data)::

date,value,week_num,year,fill_col_1,fill_col_2
01-01-2018,1763.95,1,2018,,
02-01-2018,1736.2,1,2018,,
03-01-2018,1741.1,1,2018,,
04-01-2018,1779.95,1,2018,,
05-01-2018,1801.1,1,2018,,
08-01-2018,1816,2,2018,,
09-01-2018,1823,2,2018,,
10-01-2018,1812.05,2,2018,,
11-01-2018,1825,2,2018,,
12-01-2018,1805,2,2018,,

And below is "file2.csv"(weekly data)::

date,value,week_num,year,fill_col_1,fill_col_2
07-01-2018,1764.46,1,2018,1768.953333,1756.542153
14-01-2018,1816.21,2,2018,1811.966667,1801.030007

The 2 columns to be filled in "file1.csv" is "fill_col_1" and "fill_col_2", by matching "week_num" and "year" from "file2.csv".

Is there some way where 2 columns from 2 files can be compared, with or without considering indexes("dates") which are obviously different in both files.

(If you see, the "file2.csv"(weekly) was derived using the 'resample' function in pandas, based on (daily values from) "file1.csv".. But now I am unable to join/concatentate the 2 files based on match of multiple columns/conditions.)

The expected output/result, needs to be as below:

date,value,match_col_1,match_col_2,fill_col_1,fill_col_2
01-01-2018,1763.95,1,2018,1768.953333,1756.542153
02-01-2018,1736.2,1,2018,1768.953333,1756.542153
03-01-2018,1741.1,1,2018,1768.953333,1756.542153
04-01-2018,1779.95,1,2018,1768.953333,1756.542153
05-01-2018,1801.1,1,2018,1768.953333,1756.542153
08-01-2018,1816,2,2018,1811.966667,1801.030007
09-01-2018,1823,2,2018,1811.966667,1801.030007
10-01-2018,1812.05,2,2018,1811.966667,1801.030007
11-01-2018,1825,2,2018,1811.966667,1801.030007
12-01-2018,1805,2,2018,1811.966667,1801.030007

The code I have tried is as below: (which is absolutely rookie thought process, as I see now)

df_daily = pd.read_csv("file1.csv")
df_weekly = pd.read_csv("file2.csv")
df_weekly.loc[df_weekly["year"]]==df_daily.loc[df_daily["year"]] & df_weekly.loc[df_weekly["week_num"]]==df_daily.loc[df_daily["week_num"]]

which gives an error: KeyError: "None of [Int64Index([2018, 2018, 2018, 2018, 2018, 2018, 2018, ],\n dtype='int64', length=8)] are in the [index]"


Solution

  • Try the pandas.DataFrame.merge func:

    import pandas as pd
    
    df_daily = pd.read_csv('1.csv').iloc[:,:4]
    df_weekly = pd.read_csv('2.csv').iloc[:,2:]
    print(df_daily.merge(df_weekly,on=['week_num','year'],how='left'))