Search code examples
rfunctiondataframecustom-function

Issue with local variables in r custom function


I've got a dataset

>view(interval)
#   V1 V2 V3 ID
# 1 NA 1  2  1
# 2 2  2  3  2
# 3 3  NA 1  3
# 4 4  2  2  4
# 5 NA 5  1  5

>dput(interval)
structure(list(V1 = c(NA, 2, 3, 4, NA),
V2 = c(1, 2, NA, 2, 5),
V3 = c(2, 3, 1, 2, 1), ID = 1:5), row.names = c(NA, -5L), class = "data.frame")

I would like to extract the previous not NA value (or the next, if NA is in the first row) for every row, and store it as a local variable in a custom function, because I have to perform other operations on every row based on this value(which should change for every row i'm applying the function). I've written this function to print the local variables, but when I apply it the output is not what I want

myFunction<- function(x){
              position <- as.data.frame(which(is.na(interval), arr.ind=TRUE))
              tempVar <- ifelse(interval$ID == 1, interval[position$row+1,
                         position$col], interval[position$row-1, position$col])
              return(tempVar)
}

I was expecting to get something like this

# [1]    2
# [2]    2
# [3]    4

But I get something pretty messed up instead.


Solution

  • Here's attempt number 1:

    dat <- read.table(header=TRUE, text='
    V1 V2 V3 ID
    NA 1  2  1
    2  2  3  2
    3  NA 1  3
    4  2  2  4
    NA 5  1  5')
    myfunc1 <- function(x) {
      ind <- which(is.na(x), arr.ind=TRUE)
      # since it appears you want them in row-first sorted order
      ind <- ind[order(ind[,1], ind[,2]),]
      # catch first-row NA
      ind[,1] <- ifelse(ind[,1] == 1L, 2L, ind[,1] - 1L)
      x[ind]
    }
    myfunc1(dat)
    # [1] 2 2 4
    

    The problem with this is when there is a second "stacked" NA:

    dat2 <- dat
    dat2[2,1] <- NA
    dat2
    #   V1 V2 V3 ID
    # 1 NA  1  2  1
    # 2 NA  2  3  2
    # 3  3 NA  1  3
    # 4  4  2  2  4
    # 5 NA  5  1  5
    myfunc1(dat2)
    # [1] NA NA  2  4
    

    One fix/safeguard against this is to use zoo::na.locf, which takes the "last observation carried forward". Since the top-row is a special case, we do it twice, second time in reverse. This gives us the "next non-NA value in the column (up or down, depending).

    library(zoo)
    myfunc2 <- function(x) {
      ind <- which(is.na(x), arr.ind=TRUE)
      # since it appears you want them in row-first sorted order
      ind <- ind[order(ind[,1], ind[,2]),]
      # this is to guard against stacked NA
      x <- apply(x, 2, zoo::na.locf, na.rm = FALSE)
      # this special-case is when there are one or more NAs at the top of a column
      x <- apply(x, 2, zoo::na.locf, fromLast = TRUE, na.rm = FALSE)
      x[ind]
    }
    myfunc2(dat2)
    # [1] 3 3 2 4