Search code examples
rfor-loopif-statementcumsum

cumsum with reset at flagged column in r?


This is my first time asking a question so bear with me.

My dataset (df) is like so:

animal   azimuth   south   distance
 pb1      187.561   1       1.992 
 pb1      147.219   1       8.567
 pb1      71.032    0       5.754
 pb1      119.502   1       10.451
 pb2      101.702   1       9.227
 pb2      85.715    0       8.821

I want to create an additional column (df$cumdist) that adds cumulative distance, but within each individual animal and only if df$south==1. I want the cumulative sum to reset with df$south==0.

This is what I would like the result to be (done manually):

animal   azimuth   south   distance  cumdist
 pb1      187.561   1       1.992     1.992
 pb1      147.219   1       8.567     10.559 
 pb1      71.032    0       5.754     0 
 pb1      119.502   1       10.451    10.451
 pb2      101.702   1       9.227     9.227 
 pb2      85.715    0       8.821     0

This is code I tried to implement the cumsum:

swim.az$cumdist <- cumsum(ifelse(swim.az$south==1, swim.az$distance, 0))

While it successfully stops adding when df$south==0, it does not reset. Additionally, I know I will need to embed this in a for loop to subset by animal.

Thanks so much!


Solution

  • We multiply the 'south' with 'distance' ('cumdist') to change the values in 'distance' that corresponds to 0 in 'south' to 0, grouped by 'animal' and the group created by taking the cumulative sum of logical vector (south == 0), get the cumsum of 'cumdist', ungroup and remove the columns that are not needed (grp)

    library(dplyr)
    dfN %>% 
      mutate(cumdist = south * distance) %>%
      group_by(animal, grp = cumsum(south == 0)) %>%
      mutate(cumdist = cumsum(cumdist)) %>%
      ungroup %>%
      select(-grp)
    # A tibble: 6 x 5
    #  animal azimuth south distance cumdist
    #  <chr>    <dbl> <int>    <dbl>   <dbl>
    #1 pb1      188.      1     1.99    1.99
    #2 pb1      147.      1     8.57   10.6 
    #3 pb1       71.0     0     5.75    0   
    #4 pb1      120.      1    10.5    10.5 
    #5 pb2      102.      1     9.23    9.23
    #6 pb2       85.7     0     8.82    0   
    

    Or a similar approach with base R

    with(dfN, ave(distance * south, animal, cumsum(!south), FUN = cumsum))
    #[1]  1.992 10.559  0.000 10.451  9.227  0.000
    

    data

    dfN <- structure(list(animal = c("pb1", "pb1", "pb1", "pb1", "pb2", 
    "pb2"), azimuth = c(187.561, 147.219, 71.032, 119.502, 101.702, 
    85.715), south = c(1L, 1L, 0L, 1L, 1L, 0L), distance = c(1.992, 
    8.567, 5.754, 10.451, 9.227, 8.821)), class = "data.frame", 
    row.names = c(NA, -6L))