Search code examples
rsurvey

Putting two responses in one


I am trying to summarise responses to questions from survey data, and many questions have recorded answers as 999 or 998, which mean "Don't know" and "Refused to answer" respectively. I'm trying to classify both of these under one heading ("No information"), and assign this the number -999. I'm not sure how to proceed.


Solution

  • Here is an approach using dplyr changing all 998 and 999 in all columns of a dataframe to -999. The assumption is that 998 and 999 are not used as "normal" numbers in the data, but only to indicate missing values. But that is usally the case in survey data.

    # These libraries is needed
    library(dplyr)
    library(car) # not necessary to call, but has to be installed
    
    # Some test data
    data <- data.frame(a = c(1:10, 998),
                       b = c(21:31),
                       c = c(999,31:40))
    
    # a predicate function which checks if a column x contains 998 or 999
    check_998_999 <- function (x) {
      any(x == 998) | any(x == 999)
    }
    
    # change all columns with 998 or 999 so that they become -999 
    data %>% 
      mutate_if(check_998_999,
                ~ car::recode(.x, "c(998,999) = -999"))
    

    I prefer car::recode to dplyr::recode because you have to be less specific and you can recode elements of different class. For example, The above even works when a column is character.

    data <- data.frame(a = c(1:10, 998),
                       b = c(21:31),
                       c = c("999",letters[1:10]),
                       stringsAsFactors = F)
    
    data %>% 
      mutate_if(check_998_999,
                ~ car::recode(.x, "c(998,999) = -999"))