Search code examples
pythonpandasdataframeslice

How can I split my dataframe so that consecutive numbers are each in their own dataframe?


I am new to using the python module pandas dataframes and i might have a silly question. I have a large dataframe that is structured as follows:

index ConsecutiveNumbers
idx1 1
idx2 2
idx3 3
idx4 4
idx5 1
idx6 2
idx7 3
and so on and so on

How can i get this single dataframe into a list of multiple dataframes that each look like this:

df1:

index ConsecutiveNumbers
idx1 1
idx2 2
idx3 3
idx4 4

df2:

index ConsecutiveNumbers
idx5 1
idx6 2
idx7 3

and so on

The tricky part is that not every number series has the same length so the dataframe itself can not be splitted after every 120th row.

I tried to manually search for the beginning and end of the consecutive numbers series (with a for loop that iterrates rowwise through the dataframe). But that's not really fast and I feel like there has to be a better solution (maybe a slicing approach i am unaware of). Since i am new to using dataframes, i lack of experience in this "more advanced" slicing approaches that search for a start and end of a consecutive series and get the inbetween as a new dataframe.


Solution

  • create dictionary of DataFrames by comapre difference of ConsecutiveNumbers column:

    dfs = dict(iter(df.groupby((df['ConsecutiveNumbers'].diff().ne(1).cumsum()))))
    print (dfs)
    {1:        ConsecutiveNumbers
    index                    
    idx1                    1
    idx2                    2
    idx3                    3
    idx4                    4, 2:        ConsecutiveNumbers
    index                    
    idx5                    1
    idx6                    2
    idx7                    3}
    

    If need loop in dictionary:

    for k, df in dfs.items():
        print (df)
           ConsecutiveNumbers
    index                    
    idx1                    1
    idx2                    2
    idx3                    3
    idx4                    4
           ConsecutiveNumbers
    index                    
    idx5                    1
    idx6                    2
    idx7                    3