Search code examples
pythondataframedatetimedate-range

Count values by date in Python


Something was active for some period of time (e.g. from 01-01-2022 to 01-02-2022). I have a list of those items, every one was active for a different period of time. I want to count how many items were active every day. I know only first and last day for each of them.

df = {'name': ['a','b','c','d','e','f'], 'first_day': ['01-01-2022', '15-02-2022', '20-01-2022', '5-02-2022', '19-02-2022', '21-02-2022'], 'last_day':['09-01-2022', '20-02-2022', '27-02-2022', '27-02-2022','19-02-2022', '28-02-2022']}

Result: on 01-01-2022 active was only 1 item (a), on 02-01-2022 = 1 (a), [...], on 26-02-2022 = 3 (c, d, f), [...], on 28-02-2022 = 1 (f). I care only about the count of items and I have no idea how to do it.


Solution

  • Assuming this is for pandas dataframe: create a list of days for each element, explode list into multiple rows and finally use groupby:

    df = pd.DataFrame({'name': ['a', 'b', 'c', 'd', 'e', 'f'],
                       'first_day': ['01-01-2022', '15-02-2022', '20-01-2022', '5-02-2022', '19-02-2022', '21-02-2022'],
                       'last_day': ['09-01-2022', '20-02-2022', '27-02-2022', '27-02-2022', '19-02-2022',
                                    '28-02-2022']})
    df['first'] = pd.to_datetime(df['first_day'], format="%d-%m-%Y")
    df['last'] = pd.to_datetime(df['last_day'], format="%d-%m-%Y")
    df['days'] = df.apply(lambda x: pd.date_range(x['first'], x['last'], freq='D').tolist(), axis=1)
    result = df.explode('days').groupby('days').size()