Search code examples
pythonpandasdataframemultiplication

Trying to multiply specific columns, by a portion of multiple rows in Pandas DataFrame (Python)


I am trying to multiply a few specific columns by a portion of multiple rows and creating a new column from every result. I could not really find an answer to my question in previous stackoverflow questions or on google, so maybe one of you can help.

I would like to point out that I am quite the beginner in Python, so apologies ahead for any obvious questions or strange code.

This is how my DataFrame currently looks like: enter image description here

So, for the column Rank of Hospital by Doctor_1, I want to multiply all its numbers by the values of the first row of column Rank of Doctor by Hospital_1 until column Rank of Doctor by Hospital_10. Which would result in:

1*1
2*1
3*1
4*4
...

and so on. I want to do this for every Doctor_ column. So for Doctor_2 its values should be multiplied by the second row of all those ten columns (Rank of Doctor by Hospital_. Doctor_3, multiplied by the third row etc.

So far, I have transposed the Rank of Doctor by Hospital_ columns in a new DataFrame: enter image description here

and tried to multiply this by a DataFrame of the Rank of Hospital by Doctor_ columns. Here the first column of the first df should be multiplied by the first column of the second df. (and second column * second column, etc.): enter image description here

But my current formula

preferences_of_doctors_and_hospitals_doctors_ranking.mul(preferences_of_doctors_and_hospitals_hospitals_ranking_transposed)

is obviously not working: enter image description here

Does anybody know what I am doing wrong and how I could fix this? Maybe I could write a for loop so that a new column is created for every multiplication of columns? So Multiplication_column_1 of DF3 = Column 1 of DF1 * Column 1 of DF2 and Multiplication_column_2 of DF3 = Column 2 of DF1 * Column 2 of DF2.

Thank you in advance!

Jeff


Solution

  • You can multiple 2d arrays created by filtering column with filter and values first:

    arr = df.filter(like='Rank of Hospital by').values * df.filter(like='Rank of Doctor by').values
    

    Or:

    arr = (preferences_of_doctors_and_hospitals_doctors_ranking.values * 
           preferences_of_doctors_and_hospitals_hospitals_ranking_transposed.values)
    

    Notice - necessary is same ordering of columns, same length of columns names and index in both filtered DataFrames.

    Get 2d array, so create DataFrame by constructor and join to original:

    df = df.join(pd.DataFrame(arr, index=df.index).add_prefix('Multiplied '))