Search code examples
rnasubstr

Populate the NA values in a variable with values from a different variables in R


I have data which looks like this

Linking <- data.frame(
ID = c(round((runif(20, min = 10000, max = 99999)), digits = 0), NA, NA, NA, NA),
PSU = c(paste("A", round((runif(20, min = 10000, max = 99999)), digits = 0), sep = ''), NA, NA, NA, NA),
qtr = c(rep(1:10, 2), NA, NA, NA, NA)
)

Linking$Key <- paste(Linking$ID, Linking$PSU, Linking$qtr, sep = "_")
Linking$Key[c(21:24)] <- c("87654_A15467_1", "45623_A23456_2", "67891_A12345_4", "65346_A23987_7")

What I want to do is populate the NA values for ID, PSU, and qtr from the information from "Key", but only for the rows with NA values.

Does anyone know how to do this?

This code does what I want, but it does it for all values of each variable. I want to do this just for rows where the values are NA.

Linking2 <- Linking
Linking2$ID <- substr(Linking$Key,1,5)
Linking2$PSU <- substr(Linking$Key,7,12)
Linking2$qtr <- substr(Linking$Key, 14,15)

Solution

  • The basic idea here is to assign using a logical index vector.

    Linking$ID[is.na(Linking$ID)] <- substr(Linking$Key,1,5)[is.na(Linking$ID)]