Search code examples
pythoncumsum

How to calculate cumulative covered distances over a sequence of different days and nights in Python


I would calculate the cumulative covered distance performed by a fish during each day and night of the entire study period.

I have this kind of dataframe, where 'Dates' corresponds to the date of the recording, 'Covered Distance' corresponds to the displacement measurement (km) and DayNight says if the displacement occured during the night or during the day (after or before sunset):

 Dates     Covered Distance   DayNight
0     2019-08-12      1              Day
1     2019-08-12      2              Night
2     2019-08-12      1              Night
3     2019-08-12      2              Night
4     2019-08-13      3              Night
5     2019-08-13      4              Night
6     2019-08-13      2              Day
7     2019-08-13      1              Day
8     2019-08-13      2              Night
9     2019-08-14      2              Night
10    2019-08-14      3              Night
10    2019-08-14      1              Day
....

I would to create a new dataframe like this :

       Dates     Covered Distance   DayNight    Cum_sum distance Day/night 
0     2019-08-12      1                Day          1
1     2019-08-12      4                Day          5
2     2019-08-12      2                Night        2
3     2019-08-12      1                Night        3
4     2019-08-12      2                Night        5
5     2019-08-13      3                Night        8
6     2019-08-13      4                Night        12
7     2019-08-13      2                Day          2
8     2019-08-13      1                Day          3
9     2019-08-13      2                Night        2
10    2019-08-14      2                Night        4
11    2019-08-14      3                Night        7
12    2019-08-14      1                Day          1
...

Actually I tried to use these two codes, but they both calculate the total cumulative distances for all nights and days (which is good to have a general idea, but i would be more precise):

Df2 = Df1.groupby('DayNight').Distance.sum()
or
Df2 = Df1.groupby('DayNight').cumsum().reset_index()

The problem is that a single night occurs over two dates (i.e., 2019-08-12 and 2019-08-13), so it seems difficult to me to separate each different night. Maybe i should enumerate each day and night, like day1, day1, night 1, night 1, night 1, ..., day 2 day 2, ..., night 3, night 3,...

Thanks for the help!!


Solution

  • Assuming your dates are already sorted, you can create virtual groups then apply cumulative sum:

    grp = df['DayNight'].ne(df['DayNight'].shift()).cumsum()
    df['Cumsum distance'] = df.groupby(grp)['Covered Distance'].cumsum()
    print(df)
    
    # Output
             Dates  Covered Distance DayNight  Cumsum distance
    0   2019-08-12                 1      Day                1
    1   2019-08-12                 4      Day                5
    2   2019-08-12                 2    Night                2
    3   2019-08-12                 1    Night                3
    4   2019-08-12                 2    Night                5
    5   2019-08-13                 3    Night                8
    6   2019-08-13                 4    Night               12
    7   2019-08-13                 2      Day                2
    8   2019-08-13                 1      Day                3
    9   2019-08-13                 2    Night                2
    10  2019-08-14                 2    Night                4
    11  2019-08-14                 3    Night                7
    12  2019-08-14                 1      Day                1
    

    How the groups are created ? We compare the current DayNight with the previous one. If the value is different (ne: not equal), we start a new group.