I have a dataframe containing the daily number of downloads for two apps. However every day I have 3 different download numbers: paid downloads (the highest value), organic downloads (the smallest value) and others (the middle value).
They are not labeled, so the only thing I know is that I need to order those three values and get the one in the middle. The original dataset looks like this:
id | date | downloads |
---|---|---|
100 | 2018-01-05 | 2000 |
100 | 2018-01-05 | 45000 |
100 | 2018-01-05 | 44000 |
110 | 2018-01-05 | 3000 |
110 | 2018-01-05 | 7000 |
110 | 2019-01-05 | 8000 |
100 | 2018-01-06 | 9000 |
100 | 2019-01-06 | 77000 |
100 | 2020-01-06 | 75000 |
110 | 2018-01-06 | 1000 |
110 | 2019-01-06 | 6000 |
110 | 2020-01-06 | 9000 |
And the final result I need would look like this:
id | date | downloads |
---|---|---|
100 | 2018-01-05 | 44000 |
110 | 2018-01-05 | 7000 |
100 | 2018-01-06 | 75000 |
110 | 2018-01-06 | 6000 |
Use groupby
to take the second element with nth
:
df.groupby(['id', 'date'], as_index=False).nth(1)