I have a multi-index Pandas dataframe. In my example there are two levels: vehicles (with attributes A and B) and reference_days (with attributes 1 and 2). For each vehicle, for each day, there is a set of moments in time (in a string-format, such that e.g. '2330' corresponds to 11.30pm and '30' to 0.30am). These moments in time are ordered chronologically, but for 1 reference_day they may cross the "midnight" line. That is, a time moment at 02.00am may be counted to the PREVIOUS day. I want to have a new column that takes value 1 if the time moment of that row actually corresponds to a "new" day (i.e. whether the midnight line has been crossed). This example corresponds to a train timetable where trips between midnight and (approximately) 4am are registered under the preceding day.
Example:
dict = {"vehicle": ["A"]*8 + ["B"]*8,
"reference_day" : [1, 1, 1, 1, 2, 2, 2, 2]*2,
"time" : [1830, 2200, 30, 115, 1700, 1800, 2300, 100,
1900, 2300, 15, 200, 1500, 2000, 2330, 120]}
df = pd.DataFrame(dict).reset_index(drop=True).set_index(["vehicle", "reference_day"], drop=True)
DataFrame looks like this:
time
vehicle reference_day
A 1 1830
1 2200
1 30
1 115
2 1700
2 1800
2 2300
2 100
B 1 1900
1 2300
1 15
1 200
2 1500
2 2000
2 2330
2 120
I want to have an extra column like this:
time next_day
vehicle reference_day
A 1 1830 0
1 2200 0
1 30 1
1 115 1
2 1700 0
2 1800 0
2 2300 0
2 100 1
B 1 1900 0
1 2300 0
1 15 1
1 200 1
2 1500 0
2 2000 0
2 2330 0
2 120 1
How should I achieve this in an elegant way? Hope anyone can help, thanks!
Let's try:
df['next_day'] = df.groupby(level=[0,1])['time']\
.transform(lambda x: x.diff().lt(0).cumsum())
Output:
time next_day
vehicle reference_day
A 1 1830 0
1 2200 0
1 30 1
1 115 1
2 1700 0
2 1800 0
2 2300 0
2 100 1
B 1 1900 0
1 2300 0
1 15 1
1 200 1
2 1500 0
2 2000 0
2 2330 0
2 120 1