Search code examples
rimputation

How to permanently remove all NAs?


I am imputing missing variables. The function seems to work at first:

# Replace NA with "None"

vars_to_none = c("Alley", "BsmtQual", "BsmtCond", "BsmtExposure", "BsmtFinType1", "BsmtFinSF1", "BsmtFinType2", "FireplaceQu", "GarageType", "GarageYrBlt", "GarageFinish", "GarageQual", "GarageCond", "PoolQC", "Fence", "MiscFeature", "MasVnrType")

sapply(combi %>% select(vars_to_none), function(x) x = ifelse(is.na(x), "None", x))

Output: a dataframe with "None" in formerly NA spots. Here's a portion of the output.

Alley BsmtQual BsmtCond BsmtExposure BsmtFinType1 BsmtFinSF1 BsmtFinType2

[1,] "None" "Gd"     "TA"     "No"         "GLQ"        "706"      "Unf"       
[2,] "None" "Gd"     "TA"     "Gd"         "ALQ"        "978"      "Unf"       
[3,] "None" "Gd"     "TA"     "Mn"         "GLQ"        "486"      "Unf"       
[4,] "None" "TA"     "Gd"     "No"         "ALQ"        "216"      "Unf" 

So good so far.

But when I check for NA's again...

which(is.na(combi$Alley))

...I get 2000+ entries. head() shows the same thing:

head(combi$Alley)

[1] NA NA NA NA NA NA

I tried saving the sapply function to combi, which caused an error I'm not familiar with.

combi <- sapply(combi %>% select(vars_to_none), function(x) x = ifelse(is.na(x), "None", x))
head(combi$Alley)

Error in combi$Alley : $ operator is invalid for atomic vectors

> which(is.na(combi$Alley))

Error in combi$Alley : $ operator is invalid for atomic vectors

How can I get the combi dataframe to permanently hold the replacement of NA's with "None"?


Solution

  • The first effort at code you offered does not have an assignment back to combi, so combi will be unaffected by those calculations.

    Need to do:

    combi[vars_to_non] <- sapply(combi %>% select(vars_to_none), 
                                  function(x) x = ifelse(is.na(x), "None", x))
    

    I would have not used the tidyverse-base mixture of code, so would have answered:

    combi[vars_to_non] <- lapply( combi[vars_to_non] , 
                                  function(x) { x[is.na(x)] <- "None"; x}
    

    I'm not sure whether the result would be different but I suspect my version is more efficient, because it doesn't require building multiple vectors the length of the of the x column.

    The second effort failed because the default value from sapply is a matrix and you replaced all of combi with a matrix-ified version of just the columns that you modified. Matrices in R are just atomic vectors with dimenstions.