I have a pandas data frame for the stops & scheduled times of a single transit route throughout a given day. I would like to split this into multiple frames each corresponding to individual trips made by a given bus (based only on the stop
cycles & not when the scheduled
periodicity would happen).
For example, the following has two A->B->C
trips, so looking how to split the frame (ie: at index 3 in this case) such that each sub frame has the same sequence of stops.
import pandas as pd
df = pd.DataFrame({
"scheduled": ["2023-05-25 13:00", "2023-05-25 13:15", "2023-05-25 13:45", "2023-05-25 14:35", "2023-05-25 14:50", "2023-05-25 15:20"],
"stop": ["A", "B", "C", "A", "B", "C"]
})
pd.to_datetime(df["scheduled"])
Assuming that you don't know how many stops you have but that the pattern always repeats, you could compare to the first name and increment every time this stop is found, then use groupby
to split:
group = df['stop'].eq(df['stop'].iloc[0]).cumsum()
out = [g for _,g in df.groupby(group)]
Output:
[ scheduled stop
0 1:00 pm A
1 1:15 pm B
2 1:45 pm C,
scheduled stop
3 2:35 pm A
4 2:50 pm B
5 3:20 pm C]
Intermediate with the group number:
scheduled stop group
0 1:00 pm A 1
1 1:15 pm B 1
2 1:45 pm C 1
3 2:35 pm A 2
4 2:50 pm B 2
5 3:20 pm C 2
Other option, compute the number of unique stops (nunique
) and use this to split with numpy.array_split
:
import numpy as np
n = df['stop'].nunique()
out = np.array_split(df, range(n, len(df), n))