Search code examples
pythonpandasdataframedataset

Dataframe with one date and three distinct values: how can I get the one in the middle?


I have a dataframe containing the daily number of downloads for two apps. However every day I have 3 different download numbers: paid downloads (the highest value), organic downloads (the smallest value) and others (the middle value).

They are not labeled, so the only thing I know is that I need to order those three values and get the one in the middle. The original dataset looks like this:

id date downloads
100 2018-01-05 2000
100 2018-01-05 45000
100 2018-01-05 44000
110 2018-01-05 3000
110 2018-01-05 7000
110 2019-01-05 8000
100 2018-01-06 9000
100 2019-01-06 77000
100 2020-01-06 75000
110 2018-01-06 1000
110 2019-01-06 6000
110 2020-01-06 9000

And the final result I need would look like this:

id date downloads
100 2018-01-05 44000
110 2018-01-05 7000
100 2018-01-06 75000
110 2018-01-06 6000

Solution

  • Use groupby to take the second element with nth:

    df.groupby(['id', 'date'], as_index=False).nth(1)