Search code examples
pythonpandasdataframepandas-groupby

Access a pandas group as new data frame


I am new to data analysis with pandas/pandas, coming from a Matlab background. I am trying to group data and then process the individual groups. However, I cannot figure out how to actually access the grouping result.

Here is my setup: I have a pandas dataframe df with a regular-spaced DateTime index timestamp of 10 minutes frequency. My data spans several weeks in total. I now want to group the data by days, like so:

grouping = df.groupby([pd.Grouper(level="timestamp", freq="D",)])

Note that I do not want to aggregate the groups (contrary to most examples and tutorials, it seems). I simply want to take each group in turn and process it individually, like so (does not work):

for g in grouping:
  g_df = d.toDataFrame()
  some_processing(g_df)

How do I do that? I haven't found any way to extract daily dataframe objects from the DataFrameGroupBy object.


Solution

  • Expand your groups into a dictionary of dataframes:

    data = dict(list(df.groupby(df.index.date.astype(str))))
    
    >>> data.keys()
    dict_keys(['2021-01-01', '2021-01-02'])
    
    >>> data['2021-01-01']
                            value
    timestamp                    
    2021-01-01 00:00:00  0.405630
    2021-01-01 01:00:00  0.262235
    2021-01-01 02:00:00  0.913946
    2021-01-01 03:00:00  0.467516
    2021-01-01 04:00:00  0.367712
    2021-01-01 05:00:00  0.849070
    2021-01-01 06:00:00  0.572143
    2021-01-01 07:00:00  0.423401
    2021-01-01 08:00:00  0.931463
    2021-01-01 09:00:00  0.554809
    2021-01-01 10:00:00  0.561663
    2021-01-01 11:00:00  0.537471
    2021-01-01 12:00:00  0.461099
    2021-01-01 13:00:00  0.751878
    2021-01-01 14:00:00  0.266371
    2021-01-01 15:00:00  0.954553
    2021-01-01 16:00:00  0.895575
    2021-01-01 17:00:00  0.752671
    2021-01-01 18:00:00  0.230219
    2021-01-01 19:00:00  0.750243
    2021-01-01 20:00:00  0.812728
    2021-01-01 21:00:00  0.195416
    2021-01-01 22:00:00  0.178367
    2021-01-01 23:00:00  0.607105
    

    Note: I changed your groups to be easier indexing: '2021-01-01' instead of Timestamp('2021-01-01 00:00:00', freq='D')