I have a DataFrame which contains more than 2000 rows.
Here is a part of my DataFrame:
In [2]: df
Out[2]:
A B C D
0 a b -1 3.5
1 a b -1 52
2 a b -1 2
3 a b -1 0
4 a b 0 15
5 a c -1 1612
6 a c 1 17
7 a e 1 52
8 a d -1 412
9 a d -1 532
I would like to find the index of the closest (next) value of the median value of D column grouping by A, B and C and also add a new column as Next_Med
to label it.
Here is the expected result :
A B C D Next_Med
0 a b -1 3.5 1
1 a b -1 52 0
2 a b -1 2 0
3 a b -1 0 0
4 a b 0 15 1
5 a c -1 1612 1
6 a c 1 17 1
7 a e 1 52 1
8 a d -1 412 0
9 a d -1 532 1
For example for a, b and -1 combination, the median value is 2.75 so I'd like to label 3.5 as Next_Med
.
Try this following one-liner with groupby
and tranform
with lambda
:
>>> df['Next_Med'] = df.sort_values([*'ABC']).groupby([*'ABC'])['D'].transform(lambda x: x == min(x, key=lambda y: abs(y - x.median()))).astype(int).reset_index(drop=True)
>>> df
A B C D Next_Med
0 a b -1 3.5 1
1 a b -1 52.0 0
2 a b -1 2.0 0
3 a b -1 0.0 0
4 a b 0 15.0 1
5 a c -1 1612.0 1
6 a c 1 17.0 1
7 a e 1 52.0 1
8 a d -1 412.0 0
9 a d -1 532.0 1
>>>