Pandas dataframe, comparing weighted values across multiple dataframes

I am attempting to do a weighted sum between 3 or more different dataframes.

Each of the the 3 dataframes has the same form.

df1 = pd.DataFrame([
    {'rowid':1,'predict1': 'choice1', 'predict2': 'choice2', 'predict3': 'choice3'},
    {'rowid':2,'predict1': 'choice4', 'predict2': 'choice5', 'predict3': 'choice1'},
])

df2 = pd.DataFrame([
    {'rowid':1,'predict1': 'choice1', 'predict2': 'choice3', 'predict3': 'choice4'},
    {'rowid':2,'predict1': 'choice1', 'predict2': 'choice5', 'predict3': 'choice4'},
])

df3 = pd.DataFrame([
    {'rowid':1,'predict1': 'choice2', 'predict2': 'choice3', 'predict3': 'choice1'},
    {'rowid':2,'predict1': 'choice4', 'predict2': 'choice1', 'predict3': 'choice6'},
])

I am trying to use this data to do a tally (based on weights for a given dataframe and weights for a given prediction. For example, the weights for each dataframe might be:

weights_dataframe = { 'df1': 1.1, 'df2': 1.2, 'df3': 0.9 }
weights_predictions= { 'predict1': 1.0, 'predict2': 0.5, 'predict3': 0.333 }

Each row would have a separate tally based all dataframes. For example, the tally for 'choice1', 'rowid':1, would be:

tally_row1_choice1 = 1.1*1.0 + 1.2*1.0 + 0.9*0.333

Based on this operation, I am trying to generate a new dataframe result that would show the top 3 choices (highest sum to third highest sum).

Ideally, I want to do something like this:

tally = getTop3ForEachRow(df1,df2,df3)

result = pd.DataFrame([
    {'rowid':1, 'predict1': tally[0][0], 'predict2': tally[0][1], 'predict3': tally[0][2] },
    {'rowid':2, 'predict1': tally[1][0], 'predict2': tally[1][1], 'predict3': tally[1][2] }
])

What would be the pythonic way to implement getTop3ForEachRow()? Is it possible to do this as a dataframe formula? Would numpy be the appropriate level to tackle this type of tabulation?

Solution

def getTop3ForEachRow(df1, df2, df3):

    df = pd.concat([d.set_index('rowid') for d in [df1, df2, df3]],
                   keys=['df1', 'df2', 'df3'])

    wghts_df = pd.DataFrame([1.1, 1.2, 0.9], ['df1', 'df2', 'df3'])
    wghts_pr = pd.DataFrame([1.0, 0.5, 0.333], ['predict1', 'predict2', 'predict3']).T
    wghts = wghts_df.dot(wghts_pr)

    wghts_by_group = df.groupby(level='rowid').apply(lambda x: wghts).unstack(0).stack()
    bdf = pd.concat([df, wghts_by_group], axis=1, keys=['choices', 'weights'])

    bdf1 = bdf.stack().set_index('choices', append=True)
    bdf2 = bdf1.groupby(level=[1, 3]).sum().unstack(0)
    sort = lambda x: x.sort_values(ascending=False).index
    return bdf2.apply(sort).reset_index(drop=True).head(3).values.T

Demonstration

tally = getTop3ForEachRow(df1, df2, df3)

result = pd.DataFrame([
    {'rowid':1, 'predict1': tally[0][0], 'predict2': tally[0][1], 'predict3': tally[0][2] },
    {'rowid':2, 'predict1': tally[1][0], 'predict2': tally[1][1], 'predict3': tally[1][2] }
]) 

print result

  predict1 predict2 predict3  rowid
0  choice1  choice2  choice3      1
1  choice4  choice1  choice5      2

Explanation

def getTop3ForEachRow(df1, df2, df3):
    # concat all 3 dataframes one after the other while setting
    # the rowid as the index
    df = pd.concat([d.set_index('rowid') for d in [df1, df2, df3]],
                   keys=['df1', 'df2', 'df3'])

    # wghts_df is a column, wghts_pr is a row.
    # the dot product with give all cross multiplied values.
    wghts_df = pd.DataFrame([1.1, 1.2, 0.9], ['df1', 'df2', 'df3'])
    wghts_pr = pd.DataFrame([1.0, 0.5, 0.333], ['predict1', 'predict2', 'predict3']).T
    wghts = wghts_df.dot(wghts_pr)

    # I just want to set all cross multiplied weights side
    # by side with each rowid
    wghts_by_group = df.groupby(level='rowid').apply(lambda x: wghts).unstack(0).stack()
    bdf = pd.concat([df, wghts_by_group], axis=1, keys=['choices', 'weights'])

    # pivot ['predict1', 'predict2', 'predict3'] into index
    # append to index, 'choices'
    bdf1 = bdf.stack().set_index('choices', append=True)
    # groupby rowid and choices
    bdf2 = bdf1.groupby(level=[1, 3]).sum().unstack(0)
    # sort descending, take index value (the choice) take top 3
    sort = lambda x: x.sort_values(ascending=False).index
    return bdf2.apply(sort).reset_index(drop=True).head(3).values.T