I am trying to merge two pandas.Series with different datetime but have some problem to get proper values in the final merge. I saw some posts where they keep both series in a dataframe but I would like to return a single serie which contains the sum of both.
Background: I have pandas series with count of detected people in rooms and I would like to merge them into the count for the building (containing 2 rooms in the example below). I could do that by aggregating rooms together until I get all rooms into one (which is then the building)
I feel that I would have to sort the series and then go through both row by row to get the correct count. So far I use the zip() function to go through the series row by row (which are sorted) but I suspect there is a nicer way to do it. Any ideas?
Here is a snippet code:
# The data in
room1_idx = pd.to_datetime([
'2023-08-11T17:00:44', # 6 people counted
'2023-08-11T17:06:47', # 7 people counted
'2023-08-11T17:06:49', # 8 people counted
'2023-08-11T17:07:00', # 10 people counted
'2023-08-11T17:07:20', # 8 people counted
])
room1 = pd.Series([6, 7, 8, 10, 8], index=room1_idx, name="Room 1")
room2_idx = pd.to_datetime([
'2023-08-11T17:06:45', # 1 people counted
'2023-08-11T17:06:46', # 4 people counted
'2023-08-11T17:06:47', # 5 people counted
'2023-08-11T17:07:02', # 10 people counted
'2023-08-11T17:07:10', # 7 people counted
'2023-08-11T17:07:30', # 2 people counted
])
room2 = pd.Series([1, 4, 5, 10, 7, 2], index=room2_idx, name="Room 2")
print(room1)
print(room2)
What I would like to get is a function which would output this:
building_idx = pd.to_datetime([
'2023-08-11 17:00:44', # 6+0 people counted
'2023-08-11 17:06:45', # 6+1 people counted
'2023-08-11 17:06:46', # 6+4 people counted
'2023-08-11 17:06:47', # 7+5 people counted
'2023-08-11 17:06:49', # 8+5 people counted
'2023-08-11 17:07:00', # 10+5 people counted
'2023-08-11 17:07:02', # 10+10 people counted
'2023-08-11 17:07:10', # 10+7 people counted
'2023-08-11 17:07:20', # 8+7 people counted
'2023-08-11 17:07:30', # 9+2 people counted
])
building = pd.Series([6, 7, 10, 12, 13, 15, 20, 17, 15, 11], index=building_idx, name="Building")
print(building)
concat
to align, ffill
, and sum
:
out = (pd
.concat([room1, room2], axis=1)
.ffill()
.sum(axis=1)
)
Output:
2023-08-11 17:00:44 6.0
2023-08-11 17:06:45 7.0
2023-08-11 17:06:46 10.0
2023-08-11 17:06:47 12.0
2023-08-11 17:06:49 13.0
2023-08-11 17:07:00 15.0
2023-08-11 17:07:02 20.0
2023-08-11 17:07:10 17.0
2023-08-11 17:07:20 15.0
2023-08-11 17:07:30 10.0
dtype: float64