I have a multi-response question from a survey.
The data look like this:
|respondent| friend |
|----------|-----------------|
| 001 | John, Mary |
|002 | Sue, John, Peter|
Then, I want to count, for each respondent, how many male and female friends they have. I imagine I need to create separate vectors of male and female names, then check each cell in the friend column against these vectors and count.
Any help is appreciated.
Assuming you have a list that links a name with gender, you can split up your friend
column, merge the result with your list and summarise on the gender:
library(tidyverse)
df <- tibble(
respondent = c('001', '002'),
friend = c('John, Mary', 'Sue, John, Peter')
)
names_df <- tibble(
name = c('John', 'Mary', 'Sue','Peter'),
gender = c('M', 'F', 'F', 'M')
)
df %>%
mutate(friend = strsplit(as.character(friend), ", ")) %>%
unnest(friend) %>%
left_join(names_df, by = c('friend' = 'name')) %>%
group_by(respondent) %>%
summarise(male_friends = sum(gender == 'M'),
female_friends = sum(gender == 'F'))
resulting in
# A tibble: 2 x 3
respondent male_friends female_friends
* <chr> <int> <int>
1 001 1 1
2 002 2 1