Search code examples
rdplyrspread

How to make the values of one column the main column names using spread()


I need to run a chi-square test so I need the levels of one column (gender) to be the column names for the output of different variables. Here's some data:

test <- data.frame(gender = as.character(sample(c('male','female'),10, replace = T)),
           test1 = sample(c(1:10)),
           test2 = sample(1:5,10 , replace = T))
> test
   gender test1 test2
1  female     2     2
2    male     9     1
3    male     4     4
4  female     8     1
5  female     5     4
6  female     3     3
7  female     7     3
8  female     1     1
9    male    10     2
10   male     6     2

I've used the following line of code with dplyr::spread() but it's giving me an error:

test %>% spread(gender,test1)
Error: Each row of output must be identified by a unique combination of keys.

I've followed all the examples that dplyr provides using gather() and spread() but nothing is working. If you have any tips please let me know. Here's my desired outcome:

> goal
  male female
1   10      3
2    1      4
3    5     10
4    3      9
5    6      7

Solution

  • We can create a sequence column grouped by gender to make a unique identifier and then use `spread

    library(dplyr)
    library(tidyr)
    test %>% 
        select(-test2) %>% 
        group_by(gender) %>% 
        mutate(rn = row_number()) %>% 
        spread(gender, test1)