I am trying to summarise responses to questions from survey data, and many questions have recorded answers as 999 or 998, which mean "Don't know" and "Refused to answer" respectively. I'm trying to classify both of these under one heading ("No information"), and assign this the number -999. I'm not sure how to proceed.
Here is an approach using dplyr changing all 998
and 999
in all columns of a dataframe to -999
. The assumption is that 998
and 999
are not used as "normal" numbers in the data, but only to indicate missing values. But that is usally the case in survey data.
# These libraries is needed
library(dplyr)
library(car) # not necessary to call, but has to be installed
# Some test data
data <- data.frame(a = c(1:10, 998),
b = c(21:31),
c = c(999,31:40))
# a predicate function which checks if a column x contains 998 or 999
check_998_999 <- function (x) {
any(x == 998) | any(x == 999)
}
# change all columns with 998 or 999 so that they become -999
data %>%
mutate_if(check_998_999,
~ car::recode(.x, "c(998,999) = -999"))
I prefer car::recode
to dplyr::recode
because you have to be less specific and you can recode elements of different class. For example, The above even works when a column is character.
data <- data.frame(a = c(1:10, 998),
b = c(21:31),
c = c("999",letters[1:10]),
stringsAsFactors = F)
data %>%
mutate_if(check_998_999,
~ car::recode(.x, "c(998,999) = -999"))