Search code examples
rtransformationmutate

R function for transforming comma-separated values in a cell into multiple rows with same row name?


I have a two-column dataframe in R: the first column is a broad category, and the second column contains comma-separated items within the broad category. This is what it looks like:

Orthogroup Sequences
0 Seq1, Seq2, Seq3
1 Seq4

And this is what I would like it to look like:

Orthogroup Sequence
0 Seq1
0 Seq2
0 Seq3
1 Seq4

To be honest I'm not even really sure where to start... any help is much appreciated!


Solution

  • You can accomplish this with separate_rows() from the package tidyr.

    library(tidyverse)
    Orthogroup <- c(0, 1)
    Sequences <- c("Seq1, Seq2, Seq3", "Seq4")
    df <- data.frame(Orthogroup, Sequences)
    df %>%
      separate_rows(Sequences, sep = ", ")
    #> # A tibble: 4 × 2
    #>   Orthogroup Sequences
    #>        <dbl> <chr>    
    #> 1          0 Seq1     
    #> 2          0 Seq2     
    #> 3          0 Seq3     
    #> 4          1 Seq4