Search code examples
rdataframepercentagedifference

How to calculate the percentage of change on rows with NA


How do I find and calculate the percentage of change of an indicator throughout time when there are several NAs spread throughout the data?

Let's assume that this is my dataframe:

     a   b   c   d   e
07  10  NA  NA   8   7
06   8  NA   5  NA   8
05  11  NA   7   6   7
04   5  NA   6   5   9
03  NA  NA  NA  NA  NA
02  NA  NA  NA   3  12
01  NA  NA  10  NA  15

Knowing that each row represents data from a certain year, I want to find the percentage of difference between the lowest row (earliest row) that contains a non-NA number and the uppermost row (latest row) that contains a non-NA number. The result should look like this:

 a     b     c     d     e
 1    NA    -1  1.66 -0.53

The code should be easily replicated since I have hundreds of rows to analyse (eg: it shouldn't involve manually adding each column or row name). Any help will be greatly appreciated!


Solution

  • First finding the non-NAs, then checking if any column had all NAs with length, finally do the calculation with lowest and highest indices.

    sapply(df, \(x){ 
      w_case <- which(!is.na(x))
      ifelse(length(w_case) > 0, 
             (x[min(w_case)] - x[max(w_case)]) / x[max(w_case)], 
             NA)})
             a          b          c          d          e 
     1.0000000         NA -0.5000000  1.6666667 -0.5333333
    

    Data

    df <- structure(list(a = c(10L, 8L, 11L, 5L, NA, NA, NA), b = c(NA, 
    NA, NA, NA, NA, NA, NA), c = c(NA, 5L, 7L, 6L, NA, NA, 10L), 
        d = c(8L, NA, 6L, 5L, NA, 3L, NA), e = c(7L, 8L, 7L, 9L, 
        NA, 12L, 15L)), class = "data.frame", row.names = c("07", 
    "06", "05", "04", "03", "02", "01"))