I have a column f
in my dataframe that I would like to spread into multiple columns based on the values in that column. For example:
df <- structure(list(f = c(NA, "18,17,10", "12,8", "17,11,6", "18",
"12", "12", NA, "17,11", "12")), .Names = "f", row.names = c(NA,
10L), class = "data.frame")
df
# f
# 1 <NA>
# 2 18,17,10
# 3 12,8
# 4 17,11,6
# 5 18
# 6 12
# 7 12
# 8 <NA>
# 9 17,11
# 10 12
How would I split column f
into multiple columns indicating the numbers in the row. I'm interested in something like this:
6 8 10 11 12 17 18
1 0 0 0 0 0 0 0
2 0 0 1 0 0 1 1
3 0 1 0 0 1 0 0
4 1 0 0 1 0 1 0
5 0 0 0 0 0 0 1
6 0 0 0 0 1 0 0
7 0 0 0 0 1 0 0
8 0 0 0 0 0 0 0
9 0 0 0 1 0 1 0
10 0 0 0 0 1 0 0
I'm thinking I could useunique
on the f
column to create the seperate columns based on the different numbers and then do a grepl
to determine if the specific number is in column f
but I was wondering if there was a better way. Something similar to spread
or separate
in the tidyr
package.
A solution using tidyr::separate_rows
will be as:
library(tidyverse)
df %>% mutate(ind = row_number()) %>%
separate_rows(f, sep=",") %>%
mutate(f = ifelse(is.na(f),0, f)) %>%
count(ind, f) %>%
spread(f, n, fill = 0) %>%
select(-2) %>% as.data.frame()
# ind 10 11 12 17 18 6 8
# 1 1 0 0 0 0 0 0 0
# 2 2 1 0 0 1 1 0 0
# 3 3 0 0 1 0 0 0 1
# 4 4 0 1 0 1 0 1 0
# 5 5 0 0 0 0 1 0 0
# 6 6 0 0 1 0 0 0 0
# 7 7 0 0 1 0 0 0 0
# 8 8 0 0 0 0 0 0 0
# 9 9 0 1 0 1 0 0 0
# 10 10 0 0 1 0 0 0 0