Search code examples
rtidyversecategorical-data

Calculate 'total' variables from scoring system applied to data stored as factors


I'm working in R (and tidyverse) with data from a questionnaire comprised of 11 questions, each answered on a 4-point likert scale:

  • Less than normal
  • No more than normal
  • More than normal
  • Much more than normal

The data is in a data frame with participants as rows, and responses to each question stored as an ordered factor in individual columns.

The following code replicates 5 rows of the data as it is currently stored:

library(tidyverse)

df <- tibble(id  = c(1, 2, 3, 4, 5), q1  = c(3, 4, 2, 3, 3),
             q2  = c(4, 4, 2, 3, 2), q3  = c(3, 3, 2, 2, 3),
             q4  = c(2, 2, 3, 2, 1), q5  = c(3, 3, 3, 3, 3),
             q6  = c(4, 3, 2, 2, 2), q7  = c(1, 2, 2, 2, 2),
             q8  = c(3, 3, 3, 2, 1), q9  = c(3, 4, 4, 2, 1),
             q10 = c(2, 4, 3, 2, 1), q11 = c(2, 3, 2, 2, 1)) %>% 
  mutate(across(q1:q11, ~factor(.x,
                                levels = c(1, 2, 3, 4),
                                labels = c("Less than usual",
                                           "No more than usual",
                                           "More than usual",
                                           "Much more than usual"),
                                ordered = TRUE)))

str(df)

# tibble [5 × 12] (S3: tbl_df/tbl/data.frame)
#  $ id : num [1:5] 1 2 3 4 5
#  $ q1 : Ord.factor w/ 4 levels "Less than usual"<..: 3 4 2 3 3
#  $ q2 : Ord.factor w/ 4 levels "Less than usual"<..: 4 4 2 3 2
#  $ q3 : Ord.factor w/ 4 levels "Less than usual"<..: 3 3 2 2 3
#  $ q4 : Ord.factor w/ 4 levels "Less than usual"<..: 2 2 3 2 1
#  $ q5 : Ord.factor w/ 4 levels "Less than usual"<..: 3 3 3 3 3
#  $ q6 : Ord.factor w/ 4 levels "Less than usual"<..: 4 3 2 2 2
#  $ q7 : Ord.factor w/ 4 levels "Less than usual"<..: 1 2 2 2 2
#  $ q8 : Ord.factor w/ 4 levels "Less than usual"<..: 3 3 3 2 1
#  $ q9 : Ord.factor w/ 4 levels "Less than usual"<..: 3 4 4 2 1
#  $ q10: Ord.factor w/ 4 levels "Less than usual"<..: 2 4 3 2 1
#  $ q11: Ord.factor w/ 4 levels "Less than usual"<..: 2 3 2 2 1

I need to calculate totals using two different scoring systems for the whole questionnaire as well as two subscales of select questions. First subscale is comprised of question 1–7, and second subscale question 8–11.

  • The first scoring system (Likert) assigns the values 0, 1, 2 and 3 to the factor levels, respectively.
  • The second scoring system (Binary) assigns the values 0, 0, 1 and 1, respectively.

How can I calculate these totals using the two scoring systems to get the 6 (sub)totals: total_likert, total_binary, total_ss1_likert, total_ss1_binary, total_ss2_likert and total_ss2_binary?


Solution

  • You can first update your values based on the scoring systems with across and recode (you might like to choose replace as well) and next calculate the sum scores for each id using rowwise:

    df %>%
      mutate(across(starts_with("q"), ~ recode(.x, "Less than usual" = 0,
                                               "No more than usual" = 1, 
                                               "More than usual" = 2, 
                                               "Much more than usual" = 3), 
                    .names = "likert_{.col}")) %>%
      mutate(across(starts_with("q"), ~ recode(.x, "Less than usual" = 0,
                                               "No more than usual" = 0, 
                                               "More than usual" = 1, 
                                               "Much more than usual" = 1), 
                    .names = "binary_{.col}")) %>%
      rowwise(id) %>% mutate(total_likert = sum(c_across(likert_q1:likert_q11)),
                             total_ss1_likert = sum(c_across(likert_q1:likert_q7)),
                             total_ss2_likert = sum(c_across(likert_q8:likert_q11)),
                             total_binary = sum(c_across(binary_q1:binary_q11)),
                             total_ss1_binary = sum(c_across(binary_q1:binary_q7)),
                             total_ss2_binary = sum(c_across(binary_q8:binary_q11))) %>%
      select(id, total_likert, total_binary, total_ss1_likert, total_ss1_binary, total_ss2_likert, total_ss2_binary)