I am imputing missing variables. The function seems to work at first:
# Replace NA with "None"
vars_to_none = c("Alley", "BsmtQual", "BsmtCond", "BsmtExposure", "BsmtFinType1", "BsmtFinSF1", "BsmtFinType2", "FireplaceQu", "GarageType", "GarageYrBlt", "GarageFinish", "GarageQual", "GarageCond", "PoolQC", "Fence", "MiscFeature", "MasVnrType")
sapply(combi %>% select(vars_to_none), function(x) x = ifelse(is.na(x), "None", x))
Output: a dataframe with "None" in formerly NA spots. Here's a portion of the output.
Alley BsmtQual BsmtCond BsmtExposure BsmtFinType1 BsmtFinSF1 BsmtFinType2
[1,] "None" "Gd" "TA" "No" "GLQ" "706" "Unf"
[2,] "None" "Gd" "TA" "Gd" "ALQ" "978" "Unf"
[3,] "None" "Gd" "TA" "Mn" "GLQ" "486" "Unf"
[4,] "None" "TA" "Gd" "No" "ALQ" "216" "Unf"
So good so far.
But when I check for NA's again...
which(is.na(combi$Alley))
...I get 2000+ entries. head() shows the same thing:
head(combi$Alley)
[1] NA NA NA NA NA NA
I tried saving the sapply function to combi, which caused an error I'm not familiar with.
combi <- sapply(combi %>% select(vars_to_none), function(x) x = ifelse(is.na(x), "None", x))
head(combi$Alley)
Error in combi$Alley : $ operator is invalid for atomic vectors
> which(is.na(combi$Alley))
Error in combi$Alley : $ operator is invalid for atomic vectors
How can I get the combi dataframe to permanently hold the replacement of NA's with "None"?
The first effort at code you offered does not have an assignment back to combi
, so combi
will be unaffected by those calculations.
Need to do:
combi[vars_to_non] <- sapply(combi %>% select(vars_to_none),
function(x) x = ifelse(is.na(x), "None", x))
I would have not used the tidyverse-base mixture of code, so would have answered:
combi[vars_to_non] <- lapply( combi[vars_to_non] ,
function(x) { x[is.na(x)] <- "None"; x}
I'm not sure whether the result would be different but I suspect my version is more efficient, because it doesn't require building multiple vectors the length of the of the x column.
The second effort failed because the default value from sapply is a matrix and you replaced all of combi with a matrix-ified version of just the columns that you modified. Matrices in R are just atomic vectors with dimenstions.