Search code examples
pythondataframejoinmerge

joining two dataframes on column value and column name


I have two dataframes that I am trying to join in some way to produce a third dataframe.

I am doing this in python.

df1:

Date A B C
27/04/2023 00:00 55.6637 0.32194 0.145006
28/04/2023 00:00 57.1432 0.327949 0.14712
29/04/2023 00:00 57.3842 0.322167 0.146119
30/04/2023 00:00 57.1259 0.320318 0.145881

df2:

Date change Ticker
27/04/2023 00:00 0.1 A
28/04/2023 00:00 -0.1 A
29/04/2023 00:00 0.2 A
30/04/2023 00:00 -0.2 A
27/04/2023 00:00 0.3 B
28/04/2023 00:00 -0.3 B
29/04/2023 00:00 0.4 B
30/04/2023 00:00 -0.4 B
27/04/2023 00:00 0.5 C
28/04/2023 00:00 -0.5 C
29/04/2023 00:00 0.6 C
30/04/2023 00:00 -0.6 C

I want to join the two dataframes so that the value in df1 are joined to df2 on Date and where the ticker in df2 is equal to the column name in df1.

The result should look like this:

Date change Ticker Px
27/04/2023 00:00 0.1 A 55.6637
28/04/2023 00:00 -0.1 A 57.1432
29/04/2023 00:00 0.2 A 57.3842
30/04/2023 00:00 -0.2 A 57.1259
27/04/2023 00:00 0.3 B 0.32194
28/04/2023 00:00 -0.3 B 0.327949
29/04/2023 00:00 0.4 B 0.322167
30/04/2023 00:00 -0.4 B 0.320318
27/04/2023 00:00 0.5 C 0.145006
28/04/2023 00:00 -0.5 C 0.14712
29/04/2023 00:00 0.6 C 0.146119
30/04/2023 00:00 -0.6 C 0.145881

I have tried using .iloc[] but I couldnt get it to work


Solution

  • Create DataFrames:

    import pandas as pd
    
    dates = ['27/04/2023 00:00', '28/04/2023 00:00']
    df1 = pd.DataFrame({'Date': dates,'A': [55.6, 57.2],'B': [0.31, 0.32]})
    df2 = pd.DataFrame({'Date': dates*2, 'change': list('zxcv'),'Ticker': list('AABB')})
    

    Actual soulton:

    df3 = pd.melt(df1, id_vars=['Date'], value_vars=df2.Ticker.unique(), var_name='Ticker', value_name='Px')
    df_out = pd.merge(df2, df3, on=['Date','Ticker'], how='inner')
    
    1. In the df3 function melt unpivot df1 from wide to long format.

    2. df_out merged df1 and df3 together on columns Date and Ticker.

    df1:

    Date A B
    27/04/2023 00:00 55.6 0.31
    28/04/2023 00:00 57.2 0.32

    df2:

    Date change Ticker
    27/04/2023 00:00 z A
    28/04/2023 00:00 x A
    27/04/2023 00:00 c B
    28/04/2023 00:00 v B

    df3:

    Date Ticker Px
    27/04/2023 00:00 A 55.6
    28/04/2023 00:00 A 57.2
    27/04/2023 00:00 B 0.31
    28/04/2023 00:00 B 0.32

    df_out

    Date change Ticker Px
    27/04/2023 00:00 z A 55.6
    28/04/2023 00:00 x A 57.2
    27/04/2023 00:00 c B 0.31
    28/04/2023 00:00 v B 0.32