I want to divide all values from particular columns in the dataframe rpk
by the same value from the dataframe scaling_factor
, according to sample_name. I know how to do it for a particular value (e.g. for the column 'P1-6' in rpk
all the values should be divided by 2 - according to value factor
for 'P1-6' in scaling_factor
dataframe) but how to do it for all samples?.
There is a part of input data:
import pandas as pd
rpk_data = {'P1-6': [1, 2, 3, 4],
'P1-12': [2, 4, 6, 8],
'P1-25': [6, 12, 3, 9]
}
rpk = pd.DataFrame.from_dict(rpk_data)
sc_f_data = {'sample': ['P1-6', 'P1-12', 'P1-25'],
'factor': [1, 2, 3]
}
scaling_factor = pd.DataFrame.from_dict(sc_f_data)
divided_value = rpk.loc[0, 'P1-6'] / scaling_factor.loc[0, 'factor'] # will be 1
The desired output should look like this:
P1-6 P1-12 P1-25
0 1 1 2
1 2 2 4
2 3 3 1
3 4 4 3
Create Series
with index by sample
column and divide:
out = rpk / scaling_factor.set_index('sample')['factor']
#out = rpk.div(scaling_factor.set_index('sample')['factor'])
print (out)
P1-6 P1-12 P1-25
0 1.0 1.0 2.0
1 2.0 2.0 4.0
2 3.0 3.0 1.0
3 4.0 4.0 3.0
For integer division:
out = rpk // scaling_factor.set_index('sample')['factor']
print (out)
P1-6 P1-12 P1-25
0 1 1 2
1 2 2 4
2 3 3 1
3 4 4 3
Be carefull, with different data integer division has different output:
rpk_data = {'P1-6': [1, 2, 3, 4],
'P1-12': [2, 4, 6, 8],
'P1-25': [5, 12, 3, 9]
}
rpk = pd.DataFrame.from_dict(rpk_data)
sc_f_data = {'sample': ['P1-6', 'P1-12', 'P1-25'],
'factor': [1, 2, 3]
}
scaling_factor = pd.DataFrame.from_dict(sc_f_data)
out = rpk / scaling_factor.set_index('sample')['factor']
print (out)
P1-6 P1-12 P1-25
0 1.0 1.0 1.666667
1 2.0 2.0 4.000000
2 3.0 3.0 1.000000
3 4.0 4.0 3.000000
out = rpk // scaling_factor.set_index('sample')['factor']
print (out)
P1-6 P1-12 P1-25
0 1 1 1
1 2 2 4
2 3 3 1
3 4 4 3