How do I find and calculate the percentage of change of an indicator throughout time when there are several NAs spread throughout the data?
Let's assume that this is my dataframe:
a b c d e
07 10 NA NA 8 7
06 8 NA 5 NA 8
05 11 NA 7 6 7
04 5 NA 6 5 9
03 NA NA NA NA NA
02 NA NA NA 3 12
01 NA NA 10 NA 15
Knowing that each row represents data from a certain year, I want to find the percentage of difference between the lowest row (earliest row) that contains a non-NA number and the uppermost row (latest row) that contains a non-NA number. The result should look like this:
a b c d e
1 NA -1 1.66 -0.53
The code should be easily replicated since I have hundreds of rows to analyse (eg: it shouldn't involve manually adding each column or row name). Any help will be greatly appreciated!
First finding the non-NA
s, then checking if any column had all NA
s with length
, finally do the calculation with lowest and highest indices.
sapply(df, \(x){
w_case <- which(!is.na(x))
ifelse(length(w_case) > 0,
(x[min(w_case)] - x[max(w_case)]) / x[max(w_case)],
NA)})
a b c d e
1.0000000 NA -0.5000000 1.6666667 -0.5333333
df <- structure(list(a = c(10L, 8L, 11L, 5L, NA, NA, NA), b = c(NA,
NA, NA, NA, NA, NA, NA), c = c(NA, 5L, 7L, 6L, NA, NA, 10L),
d = c(8L, NA, 6L, 5L, NA, 3L, NA), e = c(7L, 8L, 7L, 9L,
NA, 12L, 15L)), class = "data.frame", row.names = c("07",
"06", "05", "04", "03", "02", "01"))