I'm new to Stack Overflow and I have this Data Set :
df=pd.DataFrame({'ID': {0: 4, 1: 4, 2: 4, 3: 88, 4: 88, 5: 323, 6: 323},
'Step': {0: 'A', 1: 'Bar', 2: 'F', 3: 'F', 4: 'Bar', 5: 'F', 6: 'A'},
'Num': {0: 38, 1: 38, 2: 38, 3: 320, 4: 320, 5: 433, 6: 432},
'Date': {0: '2018-08-02',
1: '2018-12-02',
2: '2019-03-02',
3: '2017-03-02',
4: '2018-03-02',
5: '2020-03-04',
6: '2020-02-03'},
'Occurence': {0: 3, 1: 3, 2: 3, 3: 2, 4: 2, 5: 2, 6: 2}})
The variables 'ID' and 'Step' are Multi-index.
I would like to do two things :
FIRST :
If 'Num' is different for the same 'ID', then delete the rows of this ID.
SECONDLY :
For a same ID, the step 'F' should be the last one (with the most recent date). If not, then delete the rows of this ID.
I have some difficulties because the commands df['Step'] and df['ID'] are NOT WORKING ('ID' and 'Step' are Multi-Index cause of a recent groupby() ).
I've tried groupby(level=0) that I found on Multi index dataframe delete row with maximum value per group
But I still have some difficulties.
Could someone please help me?
Expected Output :
df=pd.DataFrame({'ID': {0: 4, 1: 4, 2: 4},
'Step': {0: 'A', 1: 'Bar', 2: 'F'},
'Num': {0: 38, 1: 38, 2: 38},
'Date': {0: '2018-08-02',
1: '2018-12-02',
2: '2019-03-02',
'Occurence': {0: 3, 1: 3, 2: 3}})
The ID 88 has been removed because the step 'F' was not the last one step (with the most recent date). The ID 323 has been removed because Num 433!=Num 432.
Since you stated that ID and Step are in the index, we can do it this way:
df1[df1.sort_values('Date').groupby('ID')['Num']\
.transform(lambda x: (x.nunique() == 1) &
(x.index.get_level_values(1)[-1] == 'F'))]
Output:
Num Date Occurence
ID Step
4 A 38 2018-08-02 3
Bar 38 2018-12-02 3
F 38 2019-03-02 3