I need to run a chi-square test so I need the levels of one column (gender) to be the column names for the output of different variables. Here's some data:
test <- data.frame(gender = as.character(sample(c('male','female'),10, replace = T)),
test1 = sample(c(1:10)),
test2 = sample(1:5,10 , replace = T))
> test
gender test1 test2
1 female 2 2
2 male 9 1
3 male 4 4
4 female 8 1
5 female 5 4
6 female 3 3
7 female 7 3
8 female 1 1
9 male 10 2
10 male 6 2
I've used the following line of code with dplyr::spread() but it's giving me an error:
test %>% spread(gender,test1)
Error: Each row of output must be identified by a unique combination of keys.
I've followed all the examples that dplyr provides using gather() and spread() but nothing is working. If you have any tips please let me know. Here's my desired outcome:
> goal
male female
1 10 3
2 1 4
3 5 10
4 3 9
5 6 7
We can create a sequence column grouped by gender to make a unique identifier and then use `spread
library(dplyr)
library(tidyr)
test %>%
select(-test2) %>%
group_by(gender) %>%
mutate(rn = row_number()) %>%
spread(gender, test1)