Search code examples
pythonpandasconcatenationseries

sum two Pandas Series with count events and different rows


I am trying to merge two pandas.Series with different datetime but have some problem to get proper values in the final merge. I saw some posts where they keep both series in a dataframe but I would like to return a single serie which contains the sum of both.

Background: I have pandas series with count of detected people in rooms and I would like to merge them into the count for the building (containing 2 rooms in the example below). I could do that by aggregating rooms together until I get all rooms into one (which is then the building)

I feel that I would have to sort the series and then go through both row by row to get the correct count. So far I use the zip() function to go through the series row by row (which are sorted) but I suspect there is a nicer way to do it. Any ideas?

Here is a snippet code:

# The data in
room1_idx = pd.to_datetime([
    '2023-08-11T17:00:44',  # 6 people counted
    '2023-08-11T17:06:47',  # 7 people counted
    '2023-08-11T17:06:49',  # 8 people counted
    '2023-08-11T17:07:00',  # 10 people counted
    '2023-08-11T17:07:20',  # 8 people counted
    ])
room1 = pd.Series([6, 7, 8, 10, 8], index=room1_idx, name="Room 1")

room2_idx = pd.to_datetime([
    '2023-08-11T17:06:45',  # 1 people counted
    '2023-08-11T17:06:46',  # 4 people counted
    '2023-08-11T17:06:47',  # 5 people counted
    '2023-08-11T17:07:02',  # 10 people counted
    '2023-08-11T17:07:10',  # 7 people counted
    '2023-08-11T17:07:30',  # 2 people counted
    ])
room2 = pd.Series([1, 4, 5, 10, 7, 2], index=room2_idx, name="Room 2")

print(room1)
print(room2)

What I would like to get is a function which would output this:

building_idx = pd.to_datetime([
    '2023-08-11 17:00:44',  # 6+0 people counted
    '2023-08-11 17:06:45',  # 6+1 people counted
    '2023-08-11 17:06:46',  # 6+4 people counted
    '2023-08-11 17:06:47',  # 7+5 people counted
    '2023-08-11 17:06:49',  # 8+5 people counted
    '2023-08-11 17:07:00',  # 10+5 people counted
    '2023-08-11 17:07:02',  # 10+10 people counted
    '2023-08-11 17:07:10',  # 10+7 people counted
    '2023-08-11 17:07:20',  # 8+7 people counted
    '2023-08-11 17:07:30',  # 9+2 people counted
    ])
building = pd.Series([6, 7, 10, 12, 13, 15, 20, 17, 15, 11], index=building_idx, name="Building")
print(building)

Solution

  • concat to align, ffill, and sum:

    out = (pd
       .concat([room1, room2], axis=1)
       .ffill()
       .sum(axis=1)
     )
    

    Output:

    2023-08-11 17:00:44     6.0
    2023-08-11 17:06:45     7.0
    2023-08-11 17:06:46    10.0
    2023-08-11 17:06:47    12.0
    2023-08-11 17:06:49    13.0
    2023-08-11 17:07:00    15.0
    2023-08-11 17:07:02    20.0
    2023-08-11 17:07:10    17.0
    2023-08-11 17:07:20    15.0
    2023-08-11 17:07:30    10.0
    dtype: float64