My data roughly appears like the following and I want to calculate a variable for each participant (v001
) with the difference between the last two available measurements (from lnslope1
to lnslope9
). Every subject has at least two measurements.
My question is:
How can I do this in R? I have read about the diff
function but I am not sure if it can be used here. Do I have to restructure the data in a long format to do this calculation? Here is the data:
structure(list(v001 = c(10002, 10004, 10005, 10006, 10007, 10011,
10012, 10018), lnslope1 = c(NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_), lnslope2 = c(NA, NA,
0.313091787977149, 0.800960043896479, NA, NA, 0, 0.246092484299754
), lnslope3 = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_), lnslope4 = c(NA, 0.218445030532656,
NA, NA, NA, NA, 0.505548566665147, NA), lnslope5 = c(0.0507723253734231,
NA, -0.0361572285993463, NA, -0.133531392624523, -0.0824189464154196,
NA, -0.186877373329815), lnslope6 = c(0.606135803570316, NA,
NA, NA, -0.0408887702539783, 0.304548524450922, NA, 0.099090902644231
), lnslope7 = c(0.192160005794242, NA, NA, 1.37147927533475,
NA, 0.485507815781701, NA, 0.0307716586667537), lnslope8 = c(0.10951852580649,
NA, NA, 1.53234783071453, 0.145860850410924, 0.604821224703469,
NA, 0.0692660582117757), lnslope9 = c(0.374693449441411, NA,
NA, 0.996237878364571, NA, 0.852777326151829, NA, 0.0299842570512681
)), .Names = c("v001", "lnslope1", "lnslope2", "lnslope3", "lnslope4",
"lnslope5", "lnslope6", "lnslope7", "lnslope8", "lnslope9"), row.names = c(NA,
8L), class = "data.frame")
Here is a roundabout way of doing it with a defined function and apply (test is your data). I like this way because each step is clearly defined:
# Finds the difference between first and last non-zero element
find_difference <- function(row) {
# Remove NAs
row <- row[!is.na(row)]
# Find number of non-NA entries
len <- length(row)
# Check to see if there is more than 1 non-NA observation
if (len > 1) {
difference <- row[len] - row[len - 1]
return(difference)
# If not more than one non-NA observation return NA
} else {
return(NA)
}
}
# Use apply across each row (MARGIN = 1) with defined function
# Exclude the first column because it contains the ID
test$diff <- apply(test[, 2:ncol(test)], MARGIN = 1, FUN = find_difference)
Result:
v001 lnslope1 lnslope2 lnslope3 lnslope4 lnslope5 lnslope6 lnslope7 lnslope8 lnslope9 diff
1 10002 NA NA NA NA 0.05077233 0.60613580 0.19216001 0.10951853 0.37469345 0.2651749
2 10004 NA NA NA 0.2184450 NA NA NA NA NA NA
3 10005 NA 0.3130918 NA NA -0.03615723 NA NA NA NA -0.3492490
4 10006 NA 0.8009600 NA NA NA NA 1.37147928 1.53234783 0.99623788 -0.5361100
5 10007 NA NA NA NA -0.13353139 -0.04088877 NA 0.14586085 NA 0.1867496
6 10011 NA NA NA NA -0.08241895 0.30454852 0.48550782 0.60482122 0.85277733 0.2479561
7 10012 NA 0.0000000 NA 0.5055486 NA NA NA NA NA 0.5055486
8 10018 NA 0.2460925 NA NA -0.18687737 0.09909090 0.03077166 0.06926606 0.02998426 -0.0392818