Search code examples
rtidyversedata-cleaningsurvey

Cleaning rank-choice survey data in R


I have survey data I'm working on in R that includes questions where respondents were given a list of options and asked to rank their top five. The data looks like this:

head(data)
   responseid     ChoiceA     ChoiceB     ChoiceC     ChoiceD     ChoiceE     ChoiceF
1        001            5            2           1         NA           4           3    
2        002           NA            4           3          5           2           1            
3        003            3            1          NA          2           4           5             
4        004           NA            5           2          1           3           4      

I'd like to reshape it so that it looks like this:

head(data_new)
   responseid     first     second     third     fourth     fifth
1        001          C          B         F          E         A      
2        002          F          E         C          B         D                                  
3        003          B          D         A          E         F                               
4        004          D          C         E          F         B                        

What would be an efficient way to do this, preferably in tidyverse? I feel like the answer is going to involve changing the values in a new column based on the names of other columns, which I can't figure out how to do.

I'm also pulling this data out of SurveyGizmo, so if anyone has suggestions for a way to set it up so that the data exports to a csv the way I want it in the first place, I would appreciate that as well. Thanks!


Solution

  • The gist is that you want to pivot longer, clean it up a bit and then pivot wider using the ranks as the column names instead of the choices. This is a similar method to akrun, though I think given the limit of five choices hardcoding the ordinal column names is less complex.

    library(tidyverse)
    tbl <- read_table2(
    "responseid     ChoiceA     ChoiceB     ChoiceC     ChoiceD     ChoiceE     ChoiceF
    001            5            2           1         NA           4           3
    002           NA            4           3          5           2           1
    003            3            1          NA          2           4           5
    004           NA            5           2          1           3           4"
    )
    tbl %>%
      pivot_longer(
        cols = -responseid,
        names_to = "choice",
        values_to = "rank",
        values_drop_na = TRUE
      ) %>%
      mutate(
        choice = str_remove(choice, "Choice"),
        rank = c("first", "second", "third", "fourth", "fifth")[as.integer(rank)]
      ) %>%
      pivot_wider(names_from = rank, values_from = choice) %>%
      select(responseid, first, second, third, fourth, fifth)
    #> # A tibble: 4 x 6
    #>   responseid first second third fourth fifth
    #>   <chr>      <chr> <chr>  <chr> <chr>  <chr>
    #> 1 001        C     B      F     E      A    
    #> 2 002        F     E      C     B      D    
    #> 3 003        B     D      A     E      F    
    #> 4 004        D     C      E     F      B
    

    Created on 2020-03-04 by the reprex package (v0.3.0)