Search code examples
pythonpandasdataframe

Encountering ValueError upon joining two pandas dataframes on a datetime index column


I have two tables which I need to join on a date column. I want to preserve all the dates in both tables, with the empty rows in each table just being filled with NaNs in the final combined array. I think an outer join is what I'm looking for. So I've written this code (with data_1 and data_2 acting as mockups of my actual tables)

import pandas as pd

def main():
    data_1 = [["May-2024", 10, 5], ["June-2024", 3, 5], ["April-2015", 1, 3]]
    df1 = pd.DataFrame(data_1, columns = ["Date", "A", "B"])
    df1["Date"] = pd.to_datetime(df1["Date"], format="%B-%Y")
    print(df1)

    data_2 = [["01-11-2024", 10, 5], ["01-06-2024", 3, 5], ["01-11-2015", 1, 3]]
    df2 = pd.DataFrame(data_2, columns = ["Date", "C", "D"])
    df2["Date"] = pd.to_datetime(df2["Date"], format="%d-%m-%Y")
    print(df2)


    merged = df1.join(df2, how="outer", on=["Date"])
    print(merged)

if __name__ == "__main__":
    main()

But when I try and perform an outer join on two pandas dataframes, I get the error

ValueError: You are trying to merge on object and int64 columns for key 'Date'. If you wish to proceed you should use pd.concat

I checked the datatype of both columns by printing

print(df1["Date"].dtype, df2["Date"].dtype)

and they both seem to be

datetime64[ns] datetime64[ns]

datetimes. So I'm not quite sure why I'm getting a ValueError

Any help is appreciated, thanks.


Solution

  • You need to use merge, not join (that will use the index):

    # ensure datetime
    df1['Date'] = pd.to_datetime(df1['Date'], format='%B-%Y')
    df2['Date'] = pd.to_datetime(df2['Date'], dayfirst=True)
    
    # use merge
    merged = df1.merge(df2, how='outer', on=['Date'])
    

    For join to work:

    merged_df = df1.set_index('Date').join(df2.set_index('Date')).reset_index()
    

    Output:

            Date     A    B     C    D
    0 2015-04-01   1.0  3.0   NaN  NaN
    1 2015-11-01   NaN  NaN   1.0  3.0
    2 2024-05-01  10.0  5.0   NaN  NaN
    3 2024-06-01   3.0  5.0   3.0  5.0
    4 2024-11-01   NaN  NaN  10.0  5.0