Search code examples
rcountsurvey

Count multi-response answers aginst a vector in R


I have a multi-response question from a survey.

The data look like this:

|respondent| friend          |
|----------|-----------------|
| 001      | John, Mary      |
|002       | Sue, John, Peter|

Then, I want to count, for each respondent, how many male and female friends they have. I imagine I need to create separate vectors of male and female names, then check each cell in the friend column against these vectors and count.

Any help is appreciated.


Solution

  • Assuming you have a list that links a name with gender, you can split up your friend column, merge the result with your list and summarise on the gender:

    library(tidyverse)
    df <- tibble(
      respondent = c('001', '002'), 
      friend = c('John, Mary', 'Sue, John, Peter')
    )
    
    names_df <- tibble(
      name = c('John', 'Mary', 'Sue','Peter'),
      gender = c('M', 'F', 'F', 'M')
    )
    
    df %>% 
      mutate(friend = strsplit(as.character(friend), ", ")) %>% 
      unnest(friend) %>% 
      left_join(names_df, by = c('friend' = 'name')) %>% 
      group_by(respondent) %>% 
      summarise(male_friends = sum(gender == 'M'), 
                female_friends = sum(gender == 'F'))
    

    resulting in

    # A tibble: 2 x 3
      respondent male_friends female_friends
    * <chr>             <int>          <int>
    1 001                   1              1
    2 002                   2              1