Search code examples
pythonpython-3.xpandasmulti-index

Concatenate multiindex


What's the simplest way to merge 2 dataframes on both region and date?

I tried join and merge and concat. I received "'<' not supported between instances of 'float' and 'str' and "cannot handle non-unique multi-index" errors

old_df                                  old_value
      region            date      
        England       2010-01-01         4
                      2010-01-02         5
         Wales        2010-01-01         4
                      2010-01-02         3
...
new_df 

                        name         new_value
      region            date      
        England       2010-01-01         10
                      2010-01-02         10
         Wales        2010-01-01         9
                      2010-01-02         10
...

Expected Output

                                 old_value   new_value 
       region          date      
        England       2010-01-01         4         10
                      2010-01-02         5         10
         Wales        2010-01-01         4         9
                      2010-01-02         3         10

Solution

  • Works perfectly. Are you sure you have ensured your date column is a date? pd.to_datetime()

    df_old = pd.read_csv(io.StringIO("""
          region            date      old_value
            England       2010-01-01         4
                nan          2010-01-02         5
             Wales        2010-01-01         4
                nan          2010-01-02         3
    """), sep="\s+")
    
    df_new = pd.read_csv(io.StringIO("""        
          region            date       new_value
            England       2010-01-01         10
                nan          2010-01-02         10
             Wales        2010-01-01         9
                 nan         2010-01-02         10"""), sep="\s+")
    
    df_old.region = df_old.region.fillna(method="ffill")
    df_new.region = df_new.region.fillna(method="ffill")
    df_old.date = pd.to_datetime(df_old.date)
    df_new.date = pd.to_datetime(df_new.date)
    
    dfj = df_old.set_index(["region","date"]).join(df_new.set_index(["region","date"]))
    
                        old_value  new_value
    region  date                            
    England 2010-01-01          4         10
            2010-01-02          5         10
    Wales   2010-01-01          4          9
            2010-01-02          3         10