Search code examples
rsumfactors

Summing up responses on several factor levels


I got 2 identical variables due to allowing multiple responses.

Let's say, variables deal about hobbies: 1 = football, 2 = ice hockey, 3 = I have no hobbies

Thus, one can have two hobbies: football PLUS ice hockey.

hobby1<-c(1,2,3)
hobby1<-factor(hobby1,labels("football", "ice hockey", "I have no hobbies")

hobby2<-c(1,2,3)
hobby2<-factor(hobby2,labels("football", "ice hockey", "I have no hobbies")

Now I try to extract amout of hobbies, reaching from 0 to 2.

I already tried: sum(hobby1<2, hobby2<2)

How can this be done, sum-function is not working for factors? Plus, my solution would not take into account 3th category: no hobbies.

Should I possibly change my data arrangement, e.g. dummy coding (football yes/no, ...).


Solution

  • Dummy coding could be an easier approach since once you transform the data into a factor you can't use sum or the < operations easily. This approach works in base R:

    df <- data.frame(football = c(0, 1, 1, 0),
                     ice_hockey = c( 1, 1, 0, 0))
    df$num_hobbies <- rowSums(df[, 1:2])
    df
    # football ice_hockey num_hobbies
    #        0          1           1
    #        1          1           2
    #        1          0           1
    #        0          0           0
    

    Or using dplyr to take advantage of column names a little more easily:

    library(dplyr)
    df <- data.frame(football = c(0, 1, 1, 0),
                     ice_hockey = c( 1, 1, 0, 0)) %>%
      mutate(num_hobbies = football + ice_hockey)
    df
    # football ice_hockey num_hobbies
    #        0          1           1
    #        1          1           2
    #        1          0           1
    #        0          0           0