Search code examples
pythonpandastime-series

How to create subdataframes based on timestep conditions?


I have a DataFrame that looks like this:

df
                         Pression
13/01/2022  09:01:02        3500
13/01/2022  09:01:13        3650
13/01/2022  09:01:24        4248
13/01/2022  09:01:35        4259
13/01/2022  09:01:46        4270
13/01/2022  11:43:28        345
13/01/2022  11:43:39        478
13/01/2022  11:43:50        589
15/01/2022  08:31:14        2048
15/01/2022  08:31:25        2574
15/01/2022  08:31:36        3659
15/01/2022  08:31:47        3784
15/01/2022  08:31:58        3968
15/01/2022  08:32:09        4009

I want to create sub-dataframe based on timestep, if the timestep is superior to 11 secondes it means that a sub-dataframe is created:

EXPECTED OUTPUT:

df_1
                         Pression
13/01/2022  09:01:02        3500
13/01/2022  09:01:13        3650
13/01/2022  09:01:24        4248
13/01/2022  09:01:35        4259
13/01/2022  09:01:46        4270

df_2
                         Pression
13/01/2022  11:43:28        345
13/01/2022  11:43:39        478
13/01/2022  11:43:28        345
13/01/2022  11:43:39        478
13/01/2022  11:43:50        589


df_3
                         Pression
15/01/2022  08:31:14        2048
15/01/2022  08:31:25        2574
15/01/2022  08:31:36        3659
15/01/2022  08:31:47        3784
15/01/2022  08:31:58        3968
15/01/2022  08:32:09        4009

How can I do this without using loop? I'm already using list of dataframe, this could be computationally expensive.


Solution

  • Create list of DataFrames:

    df.index = pd.to_datetime(df.index)
    
    #convert DatetimeIndex to Series for possible get difference
    s = df.index.to_series()
    #create groups with cumulative sum
    dfs = dict(tuple(df.groupby(s.diff().gt('11 s').cumsum())))
    
    print (dfs[0])
                         Pression
    2022-01-13 09:01:02      3500
    2022-01-13 09:01:13      3650
    2022-01-13 09:01:24      4248
    2022-01-13 09:01:35      4259
    2022-01-13 09:01:46      4270
    
    
    print (dfs[1])
                         Pression
    2022-01-13 11:43:28       345
    2022-01-13 11:43:39       478
    2022-01-13 11:43:50       589
    
    print (dfs[2])
                         Pression
    2022-01-15 08:31:14      2048
    2022-01-15 08:31:25      2574
    2022-01-15 08:31:36      3659
    2022-01-15 08:31:47      3784
    2022-01-15 08:31:58      3968
    2022-01-15 08:32:09      4009