What's the simplest way to merge 2 dataframes on both region and date?
I tried join and merge and concat. I received "'<' not supported between instances of 'float' and 'str' and "cannot handle non-unique multi-index" errors
old_df old_value
region date
England 2010-01-01 4
2010-01-02 5
Wales 2010-01-01 4
2010-01-02 3
...
new_df
name new_value
region date
England 2010-01-01 10
2010-01-02 10
Wales 2010-01-01 9
2010-01-02 10
...
Expected Output
old_value new_value
region date
England 2010-01-01 4 10
2010-01-02 5 10
Wales 2010-01-01 4 9
2010-01-02 3 10
Works perfectly. Are you sure you have ensured your date column is a date? pd.to_datetime()
df_old = pd.read_csv(io.StringIO("""
region date old_value
England 2010-01-01 4
nan 2010-01-02 5
Wales 2010-01-01 4
nan 2010-01-02 3
"""), sep="\s+")
df_new = pd.read_csv(io.StringIO("""
region date new_value
England 2010-01-01 10
nan 2010-01-02 10
Wales 2010-01-01 9
nan 2010-01-02 10"""), sep="\s+")
df_old.region = df_old.region.fillna(method="ffill")
df_new.region = df_new.region.fillna(method="ffill")
df_old.date = pd.to_datetime(df_old.date)
df_new.date = pd.to_datetime(df_new.date)
dfj = df_old.set_index(["region","date"]).join(df_new.set_index(["region","date"]))
old_value new_value
region date
England 2010-01-01 4 10
2010-01-02 5 10
Wales 2010-01-01 4 9
2010-01-02 3 10