Search code examples
rsumlag

How to calculate a value row that is based on the previous value in the same column (in R)?


For a time series analysis, I have a data frame that is based on periods. I have a column with starters and a column with patients who discontinue therapy. Now I want to calculate the current users of therapy per period. I calculated the difference between 'starters' and 'discontinue' as an intermediate step.

> df <- data.frame(
period = c(1,2,3,4,5,6), 
starters = c(595, 113, 36, 489, 28, 101),
discontinue = c(0, 11, 6, 8, 14, 8))

> df$difference <- df$starters - df$discontinue
> df

  period starters discontinue difference
1      1      595           0        595
2      2      113          11        102
3      3       36           6         30
4      4      489           8        481
5      5       28          14         14
6      6      101           8         93

To calculate current users, I would like to add a column 'current.users' with initial value = df$difference[1] and adds the difference each row. Output should look like:

> df
  period starters discontinue difference current.users
1      1      595           0        595           595
2      2      113          11        102           697
3      3       36           6         30           727
4      4      489           8        481          1208
5      5       28          14         14          1222
6      6      101           8         93          1315

I tried to use for loops and data.table, but I can not calculate the next value in the column based on the previous value in the same column. Does anyone know the correct code for this issue? Thanks in advance!


Solution

  • With dplyr

    library(tidyverse)
    
    df %>%  
      mutate(difference = starters - discontinue, 
             current.users = cumsum(difference))
    
      period starters discontinue difference current.users
    1      1      595           0        595           595
    2      2      113          11        102           697
    3      3       36           6         30           727
    4      4      489           8        481          1208
    5      5       28          14         14          1222
    6      6      101           8         93          1315
    

    Without the difference column

    df %>%  
      mutate(current.users = cumsum(starters - discontinue))
    
      period starters discontinue current.users
    1      1      595           0           595
    2      2      113          11           697
    3      3       36           6           727
    4      4      489           8          1208
    5      5       28          14          1222
    6      6      101           8          1315