Search code examples
rlag

Calculate average real variability for successive multiple measurements in R


I have a long-form dataframe, with a column (B) including the absolute successive differences between values in column (A), for each individual's ID separately.

ID = c("1", "1", "1", "1", "1", "1", "1", "2", "2", "2", "2", "2", "2")
A = c("120", "115", "125", "119", "128", "129", "130", "140", "142", "143", "145", "144", "148")
B = c("NA", "5", "10", "6", "9", "1", "1", "NA", "2", "1", "2", "1", "4")

DF <- data.frame(ID, A, B)

I would like to create a new column (C), that is the sum of the absolute differences before and including each value, divided by (the number of measurements used to calculate it minus 1).
This is what I would like the data to look like: enter image description here

I hope this makes sense, any help greatly appreciated!


Solution

  • Here's a tidyverse solution. You can first group_by the ID, then divide the cumulative sum (cumsum) of B by the row_number minus one. You can only do this after omitting the first row of each group and replacing it with NA

    Note also that in your example, the 'numeric' columns are actually character vectors, so have to be coerced to numeric first.

    library(tidyverse)
    
    DF %>%
      mutate(across(A:B, \(x) suppressWarnings(as.numeric(x)))) %>%
      group_by(ID) %>%
      mutate(C = c(NA, cumsum(B[-1])/(row_number() - 1)[-1]))
    #> # A tibble: 13 x 4
    #> # Groups:   ID [2]
    #>    ID        A     B     C
    #>    <chr> <dbl> <dbl> <dbl>
    #>  1 1       120    NA NA   
    #>  2 1       115     5  5   
    #>  3 1       125    10  7.5 
    #>  4 1       119     6  7   
    #>  5 1       128     9  7.5 
    #>  6 1       129     1  6.2 
    #>  7 1       130     1  5.33
    #>  8 2       140    NA NA   
    #>  9 2       142     2  2   
    #> 10 2       143     1  1.5 
    #> 11 2       145     2  1.67
    #> 12 2       144     1  1.5 
    #> 13 2       148     4  2
    

    Created on 2022-11-11 with reprex v2.0.2