I have a multiindex Sries that I would like to make calculations on it. I'm not familiar with multiindex dataframes, some advice will be much appreciated.
d = {'c1':['a', 'a', 'b', 'b', 'c', 'c', 'c', 'd'], 'c2':['False', 'True', 'True', 'True', 'True', 'True', 'False', 'False']}
dd = pd.DataFrame(data=d)
dd.groupby('c1')['c2'].value_counts(normalize=True)
c1 c2
a False 0.500000
True 0.500000
b True 1.000000
c True 0.666667
False 0.333333
d False 1.000000
I'd like to take the max number with True
in the c2
column divides by other numbers with True
.
For the above example, it will be 1 / 0.500000
and 1 / 0.666667
Expected output
Basically, all you have to do is use the .xs
method on your value counts.
Take:
series = dd.groupby('c1')['c2'].value_counts(normalize=True)
To get the number of True
s by c1
you can do
true_counts = series.xs('True', level=1)
Which returns something like this:
c1
a 0.500000
b 1.000000
c 0.666667
Name: c2, dtype: float64
Now you can do all your operations on a single index Series. Start by taking the max_index
and max_value
, which, in this case (because it's normalized), could be achieved by true_counts.loc[true_counts == 1]
and then retrieving the values and index, or:
max_index = true_counts.index[true_counts.argmax()]
max_val = true_counts[max_index]
And then, doing the calculation over the remaining rows:
true_counts_dropped = true_counts.drop(max_index)
ans = max_val / true_counts_dropped
To make the index look like b/a
, b/c
, you can do:
ans.index = [max_index + '/' + idx for idx in ans.index]
And the final result:
>>> ans
b/a 2.0
b/c 1.5
Name: c2, dtype: float64