Search code examples
pythonpandasdataframegroup-by

Groupby number of hours in a month in pandas


Could someone please guide how to groupby no. of hours from hourly based index to find how many hours of null values are there in a specific month? Therefore, I am thinking of having a dataframe with monthly based index.

Below given is the dataframe which has timestamp as index and another column with has occassionally null values.

timestamp rel_humidity
1999-09-27 05:00:00 82.875
1999-09-27 06:00:00 83.5
1999-09-27 07:00:00 83.0
1999-09-27 08:00:00 80.6
1999-09-27 09:00:00 nan
1999-09-27 10:00:00 nan
1999-09-27 11:00:00 nan
1999-09-27 12:00:00 nan

I tried this but the resulting dataframe is not what I expected.

gap_in_month = OG_1998_2022_gaps.groupby(OG_1998_2022_gaps.index.month, OG_1998_2022_gaps.index.year).count()

I always struggle with groupby in function. Therefore, highly appreciate any help. Thanks in advance!


Solution

  • If need 0 if no missing value per month create mask by Series.isna, convert DatetimeIndex to month periods by DatetimeIndex.to_period and aggregate sum - Trues of mask are processing like 1 or alternative with Grouper:

    gap_in_month = (OG_1998_2022_gaps['rel_humidity'].isna()
                     .groupby(OG_1998_2022_gaps.index.to_period('m')).sum())
    
    gap_in_month = (OG_1998_2022_gaps['rel_humidity'].isna()
                     .groupby(pd.Grouper(freq='m')).sum())
    

    If need only matched rows solution is similar, but first filter by boolean indexing and then aggregate counts by GroupBy.size:

    gap_in_month = (OG_1998_2022_gaps[OG_1998_2022_gaps['rel_humidity'].isna()]
                     .groupby(OG_1998_2022_gaps.index.to_period('m')).size())
    
    gap_in_month = (OG_1998_2022_gaps[OG_1998_2022_gaps['rel_humidity'].isna()]
                     .groupby(pd.Grouper(freq='m')).size())