Search code examples
rindicator

create and name NA indicator variables for a large data frame


Okay, I'm close. Everything works but the last loop for compound where I get hung up on a data type issue. Copy and run to your heart's content.

x <- c(1:12)
dim(x) <- c(3,4)
x[2,2] <- NA
x[3,3] <- NA
colnames(x) <- c("A","B","C","D")

x

newframe <- data.frame(matrix(0, ncol = 4, nrow = 3))

for (i in 1:3)
  for (j in 1:4)
  { newframe[i,j] <-  (1 -1*(is.na(x[i,j]))) }

newframe <- as.matrix((newframe))

newframe

compound <- data.frame(matrix(0, ncol = 4, nrow = 3))

for (i in 1:3) 
  for (j in 1:4 )
  {  compound[i,j] <- (as.numeric(x[i,j])*(as.numeric(newframe[i,j])))
}

compound

I'm trying to create an indicator variable for null instances and use it to create a compound variable that will zero out the original variable when null and flash the indicator.


Solution

  • Create indicator var's for missing instances and zero out or impute values for NA instances in original data:

    # create data
    x <- c(1:12)
    dim(x) <- c(3,4)
    x[2,2] <- NA
    x[3,3] <- NA
    
    x
    
    # create data frame for indicator var's
    newframe <- 1*(is.na(x))
    
    newframe
    class(newframe)
    
    # zero out NAs in data, or alternatively replaced with imputed values
    x[is.na(x)] <- 0
    
    # create data frame for original data and indicator var's
    newdata <- cbind(x, newframe)
    
    newdata 
    

    Copy and run.