Search code examples
pythonpandasdataframefor-loopsum

How to summarize a column based on repeated values?


I'm trying to sum the duration of the activities using a for loop with conditions.

mydf= {'Duration' : [14, 8, 6, 36, 12, 5, 3, 2, 4, 5, 8, 3, 14, 1, 27, 25, 117, 2, 962, 2, 2, 1], 
'Activity': ['Groom', 'Pause', 'Groom', 'Sleep', 'Sleep', 'Sleep', 'Sleep', 'Sleep', 'Sleep', 'Sleep', 'Sleep', 'Sleep', 'Sleep', 'Awaken', 'Groom', 'Pause', 'Groom', 'Eat', 'Cuddle', 'Come down', 'Dig', 'Forage']}
    df = pd.DataFrame(mydf)

I want to sum all the sleep duration together.

I tried:

Sleep_sum = [sum(df['Duration'] for i in df['Activity'] if [i+1]=='Sleep')] 

but this gives me an error : TypeError: can only concatenate str (not "int") to str.

I also tried this:

for i in range of len(df):
     df['Activity'][i] == 'Sleep'
     if [i] = [i+1]
     df['Duration'].sum()

Essentially, I need to sum the duration of 'Sleep" if it's followed by another row == "Sleep".

Thank you for your time!


Solution

  • You can create virtual groups to group consecutive activities (Sleep and other):

    out = (df.groupby(df['Activity'].ne(df['Activity'].shift()).cumsum(), as_index=False)
             .agg({'Duration': 'sum', 'Activity': 'first'}))
    print(out)
    
    # Output
        Duration   Activity
    0         14      Groom
    1          8      Pause
    2          6      Groom
    3         92      Sleep
    4          1     Awaken
    5         27      Groom
    6         25      Pause
    7        117      Groom
    8          2        Eat
    9        962     Cuddle
    10         2  Come down
    11         2        Dig
    12         1     Forage
    

    Update

    Just for Sleep activity:

    m = df['Activity'] == 'Sleep'
    sleep = (df.reset_index(drop=False)[m]
               .groupby(df['Activity'].ne(df['Activity'].shift()).cumsum())
               .agg({'index': 'first', 'Duration': 'sum', 'Activity': 'first'})
               .set_index('index'))
    
    out = pd.concat([df[~m], sleep]).sort_index()
    print(out)
    
    # Output
        Duration   Activity
    0         14      Groom
    1          8      Pause
    2          6      Groom
    3         92      Sleep
    13         1     Awaken
    14        27      Groom
    15        25      Pause
    16       117      Groom
    17         2        Eat
    18       962     Cuddle
    19         2  Come down
    20         2        Dig
    21         1     Forage