Search code examples
pythonpandasdataframe

How to divide values in columns in one dataframe by the same value in another df in Pandas?


I want to divide all values from particular columns in the dataframe rpk by the same value from the dataframe scaling_factor, according to sample_name. I know how to do it for a particular value (e.g. for the column 'P1-6' in rpk all the values should be divided by 2 - according to value factor for 'P1-6' in scaling_factor dataframe) but how to do it for all samples?.

There is a part of input data:

import pandas as pd

rpk_data = {'P1-6': [1, 2, 3, 4],
            'P1-12': [2, 4, 6, 8],
            'P1-25': [6, 12, 3, 9]
            }
rpk = pd.DataFrame.from_dict(rpk_data)

sc_f_data = {'sample': ['P1-6', 'P1-12', 'P1-25'],
             'factor': [1, 2, 3]
             }
scaling_factor = pd.DataFrame.from_dict(sc_f_data)

divided_value = rpk.loc[0, 'P1-6'] / scaling_factor.loc[0, 'factor'] # will be 1

The desired output should look like this:

   P1-6  P1-12  P1-25
0     1      1      2
1     2      2      4
2     3      3      1
3     4      4      3

Solution

  • Create Series with index by sample column and divide:

    out = rpk / scaling_factor.set_index('sample')['factor']
    #out = rpk.div(scaling_factor.set_index('sample')['factor'])
    print (out)
       P1-6  P1-12  P1-25
    0   1.0    1.0    2.0
    1   2.0    2.0    4.0
    2   3.0    3.0    1.0
    3   4.0    4.0    3.0
    

    For integer division:

    out = rpk // scaling_factor.set_index('sample')['factor']
    print (out)
       P1-6  P1-12  P1-25
    0     1      1      2
    1     2      2      4
    2     3      3      1
    3     4      4      3
    

    Be carefull, with different data integer division has different output:

    rpk_data = {'P1-6': [1, 2, 3, 4],
                'P1-12': [2, 4, 6, 8],
                'P1-25': [5, 12, 3, 9]
                }
    rpk = pd.DataFrame.from_dict(rpk_data)
    
    sc_f_data = {'sample': ['P1-6', 'P1-12', 'P1-25'],
                 'factor': [1, 2, 3]
                 }
    scaling_factor = pd.DataFrame.from_dict(sc_f_data)
    

    out = rpk / scaling_factor.set_index('sample')['factor']
    print (out)
       P1-6  P1-12     P1-25
    0   1.0    1.0  1.666667
    1   2.0    2.0  4.000000
    2   3.0    3.0  1.000000
    3   4.0    4.0  3.000000
    
    out = rpk // scaling_factor.set_index('sample')['factor']
    print (out)
       P1-6  P1-12  P1-25
    0     1      1      1
    1     2      2      4
    2     3      3      1
    3     4      4      3