Search code examples
rdplyrtidyrdata-cleaning

How can I transform multiple choice character columns to numeric indicator columns using tidyverse?


Some types of survey software handle "choose all that apply" questions in the following inconvenient way. Suppose a question asked "What type of pet(s) do you own? Choose all that apply: dog, cat, ferret, snake." The resulting dataset looks like this:

pet_tab <- tibble(
  owner = 1:5,
  pet_1 = c("dog", "cat", "ferret", "dog", "snake"),
  pet_2 = c("cat", "ferret", NA, "ferret", NA),
  pet_3 = c("ferret", NA, NA, "snake", NA),
  pet_4 = c("snake", NA, NA, NA, NA)
  )

  owner pet_1  pet_2  pet_3  pet_4
   1    dog    cat    ferret snake
   2    cat    ferret NA     NA   
   3    ferret NA     NA     NA   
   4    dog    ferret snake  NA   
   5    snake  NA     NA     NA  

This is hard to work with. A far better way to organize this data would be like this:

  owner  dog   cat ferret snake
     1     1     1      1     1
     2     0     1      1     0
     3     0     0      1     0
     4     1     0      1     1
     5     0     0      0     1

where each column indicates whether or not an owner has a given type of animal. How can I transform the first type of data into the second type? I realize there are a lot of ways to do this, but I'd like something elegant, concise, and preferably using tidyverse, though data.table will suffice as well.


Solution

  • We can reshape to 'long' format with pivot_longer on the 'pet' columns and then to wide with pivot_wider with values_fn as length and values_fill as 0

    library(dplyr)
    library(tidyr)
    pet_tab  %>%
       pivot_longer(cols = starts_with('pet'), values_drop_na = TRUE) %>% 
       pivot_wider(names_from = value, values_from = name,
          values_fn = ~ +(length(.x) > 0), values_fill = 0)
    

    -output

    # A tibble: 5 × 5
      owner   dog   cat ferret snake
      <int> <int> <int>  <int> <int>
    1     1     1     1      1     1
    2     2     0     1      1     0
    3     3     0     0      1     0
    4     4     1     0      1     1
    5     5     0     0      0     1
    

    Or in base R with table

    +(table(pet_tab$owner[row(pet_tab[-1])], unlist(pet_tab[-1])) > 0)
       
        cat dog ferret snake
      1   1   1      1     1
      2   1   0      1     0
      3   0   0      1     0
      4   0   1      1     1
      5   0   0      0     1