Search code examples
rfunctiontime-seriesdifferencelongitudinal

How can I calculate the difference between two last values in R?


My data roughly appears like the following and I want to calculate a variable for each participant (v001) with the difference between the last two available measurements (from lnslope1 to lnslope9). Every subject has at least two measurements.

My question is:

How can I do this in R? I have read about the diff function but I am not sure if it can be used here. Do I have to restructure the data in a long format to do this calculation? Here is the data:

structure(list(v001 = c(10002, 10004, 10005, 10006, 10007, 10011, 
10012, 10018), lnslope1 = c(NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_), lnslope2 = c(NA, NA, 
0.313091787977149, 0.800960043896479, NA, NA, 0, 0.246092484299754
), lnslope3 = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_), lnslope4 = c(NA, 0.218445030532656, 
NA, NA, NA, NA, 0.505548566665147, NA), lnslope5 = c(0.0507723253734231, 
NA, -0.0361572285993463, NA, -0.133531392624523, -0.0824189464154196, 
NA, -0.186877373329815), lnslope6 = c(0.606135803570316, NA, 
NA, NA, -0.0408887702539783, 0.304548524450922, NA, 0.099090902644231
), lnslope7 = c(0.192160005794242, NA, NA, 1.37147927533475, 
NA, 0.485507815781701, NA, 0.0307716586667537), lnslope8 = c(0.10951852580649, 
NA, NA, 1.53234783071453, 0.145860850410924, 0.604821224703469, 
NA, 0.0692660582117757), lnslope9 = c(0.374693449441411, NA, 
NA, 0.996237878364571, NA, 0.852777326151829, NA, 0.0299842570512681
)), .Names = c("v001", "lnslope1", "lnslope2", "lnslope3", "lnslope4", 
"lnslope5", "lnslope6", "lnslope7", "lnslope8", "lnslope9"), row.names = c(NA, 
8L), class = "data.frame")

Solution

  • Here is a roundabout way of doing it with a defined function and apply (test is your data). I like this way because each step is clearly defined:

       # Finds the difference between first and last non-zero element
    find_difference <- function(row) {
      # Remove NAs
      row <- row[!is.na(row)]
    
      # Find number of non-NA entries
      len <- length(row)
    
      # Check to see if there is more than 1 non-NA observation
      if (len > 1) {
        difference <- row[len] - row[len - 1]
        return(difference)
    
      # If not more than one non-NA observation return NA
      } else {
        return(NA)
      }
    
    
    }
    
    # Use apply across each row (MARGIN = 1) with defined function
    # Exclude the first column because it contains the ID
    test$diff <- apply(test[, 2:ncol(test)], MARGIN = 1, FUN = find_difference)
    

    Result:

       v001 lnslope1  lnslope2 lnslope3  lnslope4    lnslope5    lnslope6   lnslope7   lnslope8   lnslope9       diff
    1 10002       NA        NA       NA        NA  0.05077233  0.60613580 0.19216001 0.10951853 0.37469345  0.2651749
    2 10004       NA        NA       NA 0.2184450          NA          NA         NA         NA         NA         NA
    3 10005       NA 0.3130918       NA        NA -0.03615723          NA         NA         NA         NA -0.3492490
    4 10006       NA 0.8009600       NA        NA          NA          NA 1.37147928 1.53234783 0.99623788 -0.5361100
    5 10007       NA        NA       NA        NA -0.13353139 -0.04088877         NA 0.14586085         NA  0.1867496
    6 10011       NA        NA       NA        NA -0.08241895  0.30454852 0.48550782 0.60482122 0.85277733  0.2479561
    7 10012       NA 0.0000000       NA 0.5055486          NA          NA         NA         NA         NA  0.5055486
    8 10018       NA 0.2460925       NA        NA -0.18687737  0.09909090 0.03077166 0.06926606 0.02998426 -0.0392818