Search code examples
pythonpandasmergeconcatenation

How do I merge data between two panda's data frames where one data frame has duplicate index values


I have two data frames loaded into Pandas. Each data frame holds property information indexed by a 'pin' unique to a particular parcel of land.

The first data frame (df1) represents historic sales data. Because properties can be sold multiple times, index values (the 'pin') repeat (i.e. for each time a property was sold there will be a row with the parcel's 'pin' as the index number. If the property is sold 1 time in the data set, the index/'pin' is unique. If it was sold 5 times, the index/'pin' will occur 5 times in the data set).

The second data frame (df2) is a property record. Again they are indexed by the unique parcel pin, but because this data frame is a record of each property, the value_counts() for each index value is 1 (i.e. index values do not repeat).

I would like to add data to df1 or create a new data frame which keeps all data from df1 intact, but adds values from df2 based upon matching index values.

For Example: df1 has columns ['SALE_YEAR', 'SALE_VALUE'] - where there can be multiple rows with the same index value. df2 has columns ['Address', 'SQFT'], where the index values are all unique within the data frame. I want to add 'Address' & 'SQFT' data points to df1 by matching the index values.

Merge() & Concat() do not seem to work. I believe this is because the syntax is having a hard time processing/ matching df2 values to multiple df1 rows.

Visual Example:

enter image description here

Thank you for the help.


Solution

  • Try this:

    import pandas as pd
    merged_df = pd.merge(left=df1, right=df2, on='PIN', how='left')
    

    If that still isn't working, maybe the PIN columns datatypes do not match.

    df1['PIN'] = df1['PIN'].astype(int)
    df2['PIN'] = df2['PIN'].astype(int)
    
    merged_df = pd.merge(left=df1, right=df2, on='PIN', how='left')