Search code examples
pythondataframecomparecomparisonsklearn-pandas

Comparing two Dataframes and creating a third one where certain contions are met


I am trying to compare two different dataframe that have the same column names and indexes(not numerical) and I need to obtain a third df with the biggest value for the row with the same column name.

Example

df1=

|  | col_1 | col2 | col-3 |

| rft_12312 | 4 | 7 | 4 |

| rft_321321 | 3 | 4 | 1 |
df2=

|  | col_1 | col2 | col-3 |

| rft_12312 | 7 | 3 | 4 |

| rft_321321 | 3 | 7 | 6 |

Required result

|  | col_1 | col2 | col-3 |

| rft_12312 | 7 (because df2.value in this \[row :column\] \>df1.value) | 7 | 4 |


| rft_321321 | 3(when they are equal doesn't matter from which column is the value) | 7 | 6 |

I've already tried pd.update with filter_func defined as:

def filtration_function(val1,val2): if val1 >= val2: return val1 else: return val2

but is not working. I need the check for each column with same name.

also pd.compare but does not allow me to pick the right values.

Thank you in advance :)


Solution

  • I think one possibility would be to use "combine". This method generates an element-wise comparsion between the two dataframes and returns the maximum value of each element.

    Example:

    import pandas as pd
    
    def filtration_function(val1, val2):
        return max(val1, val2)
    
    result = df1.combine(df2, filtration_function)
    

    I think method "where" can work to:

    import pandas as pd
    
    result = df1.where(df1 >= df2, df2)