Search code examples
pythonpandasmulti-indexpanel-data

Using multindex resample in pandas with zeros results in NaN


I have a data frame that looks like this...

               name            datetime    value
1           Michael 2012-01-11 01:11:19     3999
3             Janus 2014-01-11 00:01:10        1
4             Janus 2014-01-11 01:12:29       81 
5             Janus 2014-01-11 01:11:19     8047
...             ...                 ...      ...
9001230  Excalibur1 2014-12-25 01:11:10        1
9001231  Excalibur1 2014-12-25 02:11:10        2
9001232  Excalibur1 2013-12-25 03:11:10        3
9001233  Excalibur1 2014-12-25 04:11:10        4
9001234  Excalibur1 2014-12-25 05:11:10      907

I want to find, for each name, how many the total value each day.

In days with no values listed, I want these days to still exist, and I want to fill zeros.

I can do the following, which seems to only sometimes zero fill:

df.set_index(["datetime"]).groupby("name").resample("D")["value"].sum()
name        datetime
Michael     2012-01-11     3999
Janus       2014-01-11     8129
            2014-10-12        0
            2014-10-13        0
            2014-10-14        0
                            ...  
Excalibur1  2014-12-21        0
            2014-12-22        1
            2014-12-23        0
            2014-12-24        0
            2014-12-25      917

Why is it /sometimes/ filling with zeros and what's the right way to rectangularize this so I have all dates summed and filled with zero where no records exist?

Update

The index I would like is composed of...

index=pd.MultiIndex.from_product([
pd.date_range(
  df["datetime"].dt.date.min(),  
  df["datetime"].dt.date.max(), freq="D"),
sorted(df["name"].unique()
)

And dense.


Solution

  • Don't resample, but use the date in the groupby:

    df['datetime'] = pd.to_datetime(df['datetime'])
    
    df.groupby(['name', df['datetime'].dt.date]).sum()
    

    Or, using pandas.Grouper for flexibility:

    df.groupby(['name', pd.Grouper(key='datetime', freq='D')]).sum()
    

    Output:

                           value
    name       datetime         
    Excalibur1 2013-12-25      3
               2014-12-25    914
    Janus      2014-01-11   8129
    Michael    2012-01-11   3999
    

    rectangular shape and missing dates:

    For a rectangular shape use:

    df2 = df.groupby(['name', pd.Grouper(key='datetime', freq='D')])['value'].sum().unstack(level='name', fill_value=0)
    

    Output:

    name        Excalibur1  Janus  Michael
    datetime                              
    2013-12-25           3      0        0
    2014-12-25         914      0        0
    2014-01-11           0   8129        0
    2012-01-11           0      0     3999
    

    And to add missing dates, reindex:

    df2 = df.groupby(['name', pd.Grouper(key='datetime', freq='D')])['value'].sum().unstack(level='name', fill_value=0)
    df2 = df2.reindex(pd.date_range(df['datetime'].dt.date.min(), df['datetime'].max()), fill_value=0)
    

    Output:

    name        Excalibur1  Janus  Michael
    2012-01-11           0      0     3999
    2012-01-12           0      0        0
    2012-01-13           0      0        0
    2012-01-14           0      0        0
    2012-01-15           0      0        0
    ...