I am new to using the python module pandas dataframes and i might have a silly question. I have a large dataframe that is structured as follows:
index | ConsecutiveNumbers |
---|---|
idx1 | 1 |
idx2 | 2 |
idx3 | 3 |
idx4 | 4 |
idx5 | 1 |
idx6 | 2 |
idx7 | 3 |
and so on | and so on |
How can i get this single dataframe into a list of multiple dataframes that each look like this:
df1:
index | ConsecutiveNumbers |
---|---|
idx1 | 1 |
idx2 | 2 |
idx3 | 3 |
idx4 | 4 |
df2:
index | ConsecutiveNumbers |
---|---|
idx5 | 1 |
idx6 | 2 |
idx7 | 3 |
and so on
The tricky part is that not every number series has the same length so the dataframe itself can not be splitted after every 120th row.
I tried to manually search for the beginning and end of the consecutive numbers series (with a for loop that iterrates rowwise through the dataframe). But that's not really fast and I feel like there has to be a better solution (maybe a slicing approach i am unaware of). Since i am new to using dataframes, i lack of experience in this "more advanced" slicing approaches that search for a start and end of a consecutive series and get the inbetween as a new dataframe.
create dictionary of DataFrames by comapre difference of ConsecutiveNumbers
column:
dfs = dict(iter(df.groupby((df['ConsecutiveNumbers'].diff().ne(1).cumsum()))))
print (dfs)
{1: ConsecutiveNumbers
index
idx1 1
idx2 2
idx3 3
idx4 4, 2: ConsecutiveNumbers
index
idx5 1
idx6 2
idx7 3}
If need loop in dictionary:
for k, df in dfs.items():
print (df)
ConsecutiveNumbers
index
idx1 1
idx2 2
idx3 3
idx4 4
ConsecutiveNumbers
index
idx5 1
idx6 2
idx7 3