The data that I currently have resembles something like the table below:
Station A operates from 12AM to 1AM, Station B operates from 1AM to 2AM and Station C operates from 2AM onwards.
Date Time | Station | Entries | Exits |
---|---|---|---|
2023-01-01 00:00:00 | Station A | 1 | 1 |
2023-01-01 01:00:00 | Station A | 1 | 1 |
2023-01-01 01:00:00 | Station B | 2 | 2 |
2023-01-01 02:00:00 | Station B | 2 | 2 |
2023-01-01 02:00:00 | Station C | 3 | 3 |
I would like to add in additional rows so that my dataframe holds 'n' rows at each unique hour frequency, where n is the number of unique Stations. In this example it would mean 3 rows (for Stations A, B, C) for 12AM, 1AM and 2AM for a total of 9. What pandas functions can I use to convert the above table to a format similar to the one below?
Date Time | Station | Entries | Exits |
---|---|---|---|
2023-01-01 00:00:00 | Station A | 1 | 1 |
2023-01-01 00:00:00 | Station B | 0 | 0 |
2023-01-01 00:00:00 | Station C | 0 | 0 |
2023-01-01 01:00:00 | Station A | 1 | 1 |
2023-01-01 01:00:00 | Station B | 2 | 2 |
2023-01-01 01:00:00 | Station C | 0 | 0 |
2023-01-01 02:00:00 | Station A | 0 | 0 |
2023-01-01 02:00:00 | Station B | 2 | 2 |
2023-01-01 02:00:00 | Station C | 3 | 3 |
Also including code for generating the initial dataframe below for ease of reference.
df = pd.DataFrame(
{'Date Time':pd.to_datetime(['2023-01-01 00:00:00', '2023-01-01 01:00:00',
'2023-01-01 01:00:00', '2023-01-01 02:00:00',
'2023-01-01 02:00:00']),
'Station':['Station A', 'Station A', 'Station B', 'Station B', 'Station C'],
'Entries':[1, 1, 2, 2, 3],
'Exits':[1, 1, 2, 2, 3]}
)
I've tried using groupby and resampling with hourly frequency, but although this fills in the gaps for hours in between, it does not backfill (i.e. adding additional rows for Station C for the hours of 12 and 1 AM).
If you first perform a long-to-wide conversion using pivot
, you can then use stack
with dropna=False
to create the additional rows:
(df
.pivot(index='Date Time', columns='Station')
.stack(dropna=False)
.fillna(0)
.astype('int')
.reset_index()
)