Search code examples
pythonpandaspandas-loc

How to transform and combine two DataFrames


I'm trying to assign dataframe columns to the row and column of the .loc function. I have a DataFrame df with no set index, with Sites and Visits as my column headers

Index Site Visit
0 101 Visit 1
1 102 Visit 1
2 102 Visit 2

I have another dataframe df2 with Sites as my dataframe index, and Visits as my columns with Cost as my values.

Index Visit 1 Visit 2
101 50 60
102 100 120

I'm trying to use .loc to index the Cost from df2 and add it as a column to df1 like so

Index Site Visit Cost
0 101 Visit 1 50
1 102 Visit 1 60
2 102 Visit 2 120

I tried using the following code to provide a row and column value

df['Cost'] = df2.loc[df['Site'],df['Visit']]

But I got the following error:

KeyError: "Passing list-likes to .loc or [] with any missing labels is no longer supported. 
The following labels were missing: etc. etc. 

Any idea how to use two column values from one DataFrame as the .loc row and column values?


Solution

    • This is a question about merging, but the dataframes should be transformed, in order to properly merge them.

    Transform the dataframes

    • Transform df2 by stacking 'Visit 1' and 'Visit 2' into a single column, with the values in an adjacent column
    • Technically, the columns do not need to be renamed, as is done below. However, I think it makes it easier when creating the .merge()
    import pandas as pd
    
    # create df1
    df1 = pd.DataFrame({'Index': ['0', '1', '2'], 'Site': ['101', '102', '102'], 'Visit': ['Visit 1', 'Visit 1', 'Visit 2']})
    
    # drop the Index column
    df1.drop(columns=['Index'], inplace=True)
    
    # display(df1)
      Site    Visit
    0  101  Visit 1
    1  102  Visit 1
    2  102  Visit 2
    
    # create df2
    df2 = pd.DataFrame({'Index': ['101', '102'], 'Visit 1': ['50', '100'], 'Visit 2': ['60', '120']})
    
    # display(df2)
      Index Visit 1 Visit 2
    0   101      50      60
    1   102     100     120
    
    # stack Visit 1 and Visit 2 into a single column
    df2 = df2.set_index('Index').stack().reset_index()
    
    # rename the columns
    df2.columns = ['Site', 'Visit', 'Cost']
    
    # display(df2)
      Site    Visit Cost
    0  101  Visit 1   50
    1  101  Visit 2   60
    2  102  Visit 1  100
    3  102  Visit 2  120
    

    Merge the dataframes

    cost = df1.merge(df2, on=['Site', 'Visit'])
    
    # display(cost)
      Site    Visit Cost
    0  101  Visit 1   50
    1  102  Visit 1  100
    2  102  Visit 2  120