Search code examples
rdata-manipulation

R - Can we create a list of values from a column that occasionally has multiple values listed in a single row?


I have a df that contains a lot of genes and information about those genes. It looks something like this

**Genes**              **Log2FC**       **SD**   **Pvalue** 

A2M                        2              3        0.001  

Aars                       4              4        0.001 

Actb;Actg1                 3              5        0.001

Cxcl1;Cxcl2;Cxcl3          5              6        0.001                                                                                               

What I would typically do to get the list of genes that I want would be something like df[,1]. However, in this case some of the rows contain multiple genes separated by a ";". Is it possible to pull these genes out?

Using df[,1] I would get a list like... A2M, Aars, Actb;Actg1, Cxcl1;Cxcl2;Cxcl3

What I want instead would be this: A2M, Aars, Actb, Actg1, Cxcl1, Cxcl2, Cxcl3

Thank you!

I can accomplish this in Excel using the "Text to Columns" feature. But I would like to be able to do everything in R. If someone could help, I would greatly appreciate it.


Solution

  • If you want each of the genes to have it's own row in the data, tidyr::separate_rows(your_data, Genes) should work. If you want the genes as a vector not in your data frame, your_data$Genes |> strsplit(split = ";") |> unlist().