I have a two-column dataframe in R: the first column is a broad category, and the second column contains comma-separated items within the broad category. This is what it looks like:
Orthogroup | Sequences |
---|---|
0 | Seq1, Seq2, Seq3 |
1 | Seq4 |
And this is what I would like it to look like:
Orthogroup | Sequence |
---|---|
0 | Seq1 |
0 | Seq2 |
0 | Seq3 |
1 | Seq4 |
To be honest I'm not even really sure where to start... any help is much appreciated!
You can accomplish this with separate_rows()
from the package tidyr
.
library(tidyverse)
Orthogroup <- c(0, 1)
Sequences <- c("Seq1, Seq2, Seq3", "Seq4")
df <- data.frame(Orthogroup, Sequences)
df %>%
separate_rows(Sequences, sep = ", ")
#> # A tibble: 4 × 2
#> Orthogroup Sequences
#> <dbl> <chr>
#> 1 0 Seq1
#> 2 0 Seq2
#> 3 0 Seq3
#> 4 1 Seq4