Suppose I have this data:
> foo <- data.frame(x=c(1,2,3), y=c("a", "b|c", "d|e|f"))
> foo
x y
1 1 a
2 2 b|c
3 3 d|e|f
and I would like to get to this:
> foo_long <- data.frame(x=c(1,2,2,3,3,3), y=c("a", "b", "c", "d", "e", "f"))
> foo_long
x y
1 1 a
2 2 b
3 2 c
4 3 d
5 3 e
6 3 f
I've gotten as far as
> foo_long <- foo %>% mutate(y=str_split(y, "\\|"))
> foo_long
x y
1 1 a
2 2 b, c
3 3 d, e, f
> str(foo_long)
'data.frame': 3 obs. of 2 variables:
$ x: num 1 2 3
$ y:List of 3
..$ : chr "a"
..$ : chr "b" "c"
..$ : chr "d" "e" "f"
but that y column is vectors, and I don't know to get the values out of the vectors.
EDIT: In fact, by the end I'd like to have foo_long2
below, but I think that's another question.
foo_long2 <- data.frame(x=c(1,2,3),
a=c(1,0,0), b=c(0,1,0), c=c(0,1,0),
d=c(0,0,1), e=c(0,0,1), f=c(0,0,1))
> foo_long2
x a b c d e f
1 1 1 0 0 0 0 0
2 2 0 1 1 0 0 0
3 3 0 0 0 1 1 1
In the tidyr package there is the separate_longer_delim()
or the older separate_rows()
function.
library(tidyr)
foo <- data.frame(x=c(1,2,3), y=c("a", "b|c", "d|e|f"))
separate_longer_delim(foo, y, delim ="|")
x y
1 1 a
2 2 b
3 2 c
4 3 d
5 3 e
6 3 f
For the second part here is a working solution with pivot_wider()
separate_longer_delim(foo, y, delim ="|") %>%
pivot_wider(names_from = y, values_from= y, values_fn = length, values_fill = 0)
# A tibble: 3 × 7
x a b c d e f
<dbl> <int> <int> <int> <int> <int> <int>
1 1 1 0 0 0 0 0
2 2 0 1 1 0 0 0
3 3 0 0 0 1 1 1