Search code examples
rdplyrtidyversedata-cleaningsurvey

Tally several columns of categorical variables in R


I have survey data where respondents rated several items on a Likert scale, which looks like this:

id  item1                   item2                   item3                   item4

42  Moderately adequate     Completely adequate     Very adequate           Very adequate
48  Moderately adequate     Moderately adequate     Moderately adequate     Moderately adequate
49  Moderately adequate     Moderately adequate     Moderately adequate     Moderately adequate
50  Slightly adequate       Slightly adequate       Slightly adequate       Not at all adequate

I want to convert this to a data structure that, for each item, has a count of ratings it received, like so:

rating              item1       item2       item3       item4

Not at all adequate     0           0           0           1
Slightly adequate       1           1           1
Moderately adequate     3           2           2           2
Very adequate           0           0           1           1
Completely adequate     0           1           0           0

What's the most efficient way to reshape this data? I've tried dcast(data = melt(data, id.vars = "id"), value ~.), but that totals the ratings across all four items instead of preserving each item in its own column; same problem for count and tally. I could do this item by item and then merge the columns back together but it seems like there's got to be a simpler way, especially since I need to replicate this across several different lists of items.


Solution

  • Get data in long format, count and get it back in wide format :

    library(dplyr)
    library(tidyr)
    
    data %>%
      pivot_longer(cols = -id) %>%
      count(name, value) %>%
      pivot_wider(names_from = name, values_from = n, values_fill = list(n = 0))
    
    # A tibble: 5 x 5
    #  value               item1 item2 item3 item4
    #  <chr>               <int> <int> <int> <int>
    #1 Moderately_adequate     3     2     2     2
    #2 Slightly_adequate       1     1     1     0
    #3 Completely_adequate     0     1     0     0
    #4 Very_adequate           0     0     1     1
    #5 Not_at_all_adequate     0     0     0     1
    

    data

    I added underscore to values in item columns since it was difficult to copy the data with spaces.

    data <- structure(list(id = c(42L, 48L, 49L, 50L),item1 = c("Moderately_adequate",
    "Moderately_adequate", "Moderately_adequate", "Slightly_adequate"
    ), item2 = c("Completely_adequate", "Moderately_adequate", "Moderately_adequate",
    "Slightly_adequate"), item3 = c("Very_adequate", "Moderately_adequate", 
    "Moderately_adequate", "Slightly_adequate"), item4 = c("Very_adequate", 
    "Moderately_adequate", "Moderately_adequate", "Not_at_all_adequate"
    )), class = "data.frame", row.names = c(NA, -4L))