Search code examples
rlevels

Convert character to a Factor with numeric values other than 1, 2, 3 to calculate score


In R, I have a dataframe with the answers to a questionnaire, and due to technical reasons the data could only be provided as the answer labels, so not the answer codes.

The answers can be Never, Sometimes, Always which corresponds with the answer codes 0, 1, 3, and these codes can be used to calculate a total score.

I've isolated a test dataset (the full data has more columns) and the R syntax I've got so far.

df <- read.table(text = "
DateOfBirth,Sex,Question_01,Question_02,Question_03,Question_04
12-03-2001,M,Sometimes,Sometimes,Never,Never
21-08-1988,F,Always,Never,Always,Sometimes
30-11-1985,F,Always,Always,Sometimes,Never
01-04-2001,M,Sometimes,Sometimes,Never,Never
12-12-1986,M,Always,Sometimes,Always,Never
19-02-1982,F,Always,Always,Never,Sometimes
11-05-1980,M,Sometimes,Sometimes,Never,Never
01-06-2000,F,Always,Sometimes,Always,Always
20-11-1981,F,Never,Never,Never,Sometimes
30-07-1982,M,Never,Never,Sometimes,Never
", header = TRUE, sep = ",", na.strings = "")

# Levels F, M (alphabetically sorted -> 1, 2)
df$Sex_fact <- factor(df$Sex)

lookup_freq <- c(
  "0" = "Never",
  "1" = "Sometimes",
  "3" = "Always"
)
df$Q1 <- factor(df$Question_01, levels = lookup_freq, labels = names(lookup_freq))
df$Q2 <- factor(df$Question_02, levels = lookup_freq, labels = names(lookup_freq))
df$Q3 <- factor(df$Question_03, levels = lookup_freq, labels = names(lookup_freq))
df$Q4 <- factor(df$Question_04, levels = lookup_freq, labels = names(lookup_freq))

df$Total_score <- as.numeric(df$Q1) + as.numeric(df$Q2) + as.numeric(df$Q3) + as.numeric(df$Q4)

print(df)

The Total_score should just be all the answers added up, according to the 0, 1, 3 codes. The result I get is incorrect, and I understand that this is because it uses the numeric values of the factor, which is just 1, 2, 3. But I don't know how to set the levels to the desired codes, using the list I've got in lookup_freq.

   DateOfBirth Sex Question_01 Question_02 Question_03 Question_04 Sex_fact Q1 Q2 Q3 Q4 Total_score
1   12-03-2001   M   Sometimes   Sometimes       Never       Never        M  1  1  0  0           6
2   21-08-1988   F      Always       Never      Always   Sometimes        F  3  0  3  1           9
3   30-11-1985   F      Always      Always   Sometimes       Never        F  3  3  1  0           9
4   01-04-2001   M   Sometimes   Sometimes       Never       Never        M  1  1  0  0           6
5   12-12-1986   M      Always   Sometimes      Always       Never        M  3  1  3  0           9
6   19-02-1982   F      Always      Always       Never   Sometimes        F  3  3  0  1           9
7   11-05-1980   M   Sometimes   Sometimes       Never       Never        M  1  1  0  0           6
8   01-06-2000   F      Always   Sometimes      Always      Always        F  3  1  3  3          11
9   20-11-1981   F       Never       Never       Never   Sometimes        F  0  0  0  1           5
10  30-07-1982   M       Never       Never   Sometimes       Never        M  0  0  1  0           5

So my question is, is it possible to create a Factor column with the labels Never, Sometimes, Always but with the underlying levels as 0, 1, 3 ? Or is it better to just create new columns and replace the character label values with the integer score values?

Or what is the best way to use a lookup variable, similar to lookup_freq? Because I've got more questionnaires with similarly coded answer columns, sometimes with 4 or 5 answer possibilities.


Solution

  • Option 1: extra level

    A hack, but you can create factors such that they resolve to 1, 2, and 4 (instead of 0, 1, and 3) and subtract 1 for each column you're reference.

    ### ...
    lookup_freq <- c("Never", "Sometimes", "Who Cares", "Always")
    as.numeric(factor(c("Never", "Sometimes", "Always"), levels=lookup_freq)) - 1
    # [1] 0 1 3
    
    df$Q1 <- factor(df$Question_01, levels = lookup_freq)
    df$Q2 <- factor(df$Question_02, levels = lookup_freq)
    df$Q3 <- factor(df$Question_03, levels = lookup_freq)
    df$Q4 <- factor(df$Question_04, levels = lookup_freq)
    
    (as.numeric(df$Q1) + as.numeric(df$Q2) + as.numeric(df$Q3) + as.numeric(df$Q4)) - 4
    #  [1]  2  7  7  2  7  7  2 10  1  1
    
    # or more clearly
    rowSums(sapply(df[,c("Q1","Q2","Q3","Q4")], as.numeric)-1)
    #  [1]  2  7  7  2  7  7  2 10  1  1
    

    Option 2: lookup dictionary

    # original `df` with no factors
    lookup_dict <- c("Never"=0, "Sometimes"=1, "Always"=3)
    rowSums(sapply(df[,3:6], function(i) lookup_dict[i]))
    # Sometimes    Always    Always Sometimes    Always    Always Sometimes    Always     Never     Never 
    #         2         7         7         2         7         7         2        10         1         1