Search code examples
rdataframelistpivottidyr

How to get from pipe-delimited lists to long format in R using tidyverse?


Suppose I have this data:

> foo <- data.frame(x=c(1,2,3), y=c("a", "b|c", "d|e|f"))
> foo
  x     y
1 1     a
2 2   b|c
3 3 d|e|f

and I would like to get to this:

> foo_long <- data.frame(x=c(1,2,2,3,3,3), y=c("a", "b", "c", "d", "e", "f"))
> foo_long
  x y
1 1 a
2 2 b
3 2 c
4 3 d
5 3 e
6 3 f

I've gotten as far as

> foo_long <- foo %>% mutate(y=str_split(y, "\\|"))
> foo_long
  x       y
1 1       a
2 2    b, c
3 3 d, e, f
> str(foo_long)
'data.frame':   3 obs. of  2 variables:
 $ x: num  1 2 3
 $ y:List of 3
  ..$ : chr "a"
  ..$ : chr  "b" "c"
  ..$ : chr  "d" "e" "f"

but that y column is vectors, and I don't know to get the values out of the vectors.

EDIT: In fact, by the end I'd like to have foo_long2 below, but I think that's another question.

foo_long2 <- data.frame(x=c(1,2,3),
                        a=c(1,0,0), b=c(0,1,0), c=c(0,1,0),
                        d=c(0,0,1), e=c(0,0,1), f=c(0,0,1))
> foo_long2
  x a b c d e f
1 1 1 0 0 0 0 0
2 2 0 1 1 0 0 0
3 3 0 0 0 1 1 1

Solution

  • In the tidyr package there is the separate_longer_delim() or the older separate_rows() function.

    library(tidyr)
    foo <- data.frame(x=c(1,2,3), y=c("a", "b|c", "d|e|f"))
    separate_longer_delim(foo, y, delim ="|")
    
      x y
    1 1 a
    2 2 b
    3 2 c
    4 3 d
    5 3 e
    6 3 f
    

    For the second part here is a working solution with pivot_wider()

    separate_longer_delim(foo, y, delim ="|") %>% 
       pivot_wider(names_from = y, values_from= y, values_fn = length, values_fill = 0)
    
    # A tibble: 3 × 7
          x     a     b     c     d     e     f
      <dbl> <int> <int> <int> <int> <int> <int>
    1     1     1     0     0     0     0     0
    2     2     0     1     1     0     0     0
    3     3     0     0     0     1     1     1