r dataframe ggplot2 tidyverse data-wrangling

Using R, how to regroup multiple dataframe columns into a smaller number of new columns

I am carrying out some exercises from a very good online R/bioinformatics course. To this end I am wrangling with data in the form of a 'SummarizedExperiment' object from a Bioconductor package of the same name. The rows consist of gene names and gene expression values; the columns consist of 9 ctrl (control) samples, 9 'drug1' treated samples and 9 'drug5' treated samples. Here is what the table looks like: The task is to regroup data in this dataframe so that CTRL0_1 - CTRL0_9 are placed in a single column, named 'CTRL0'. In the same fashion, new 'DRUG1' and 'DRUG5' named columns are needed consisting of gene expression for each gene in the columns DRUG1_1 - DRUG1_9 and DRUG5_1 - DRUG5_9, respectively. Data are derived from the final question on this webpage: https://uclouvain-cbio.github.io/WSBIM1207/sec-bioinfo.html The task is to generate a ggplot like this: Instead, with my inelegant code I get this: To generate MY plot, I used this code:

table_gene_gp_expr <- geneCol_filter_assaySeDf %>% 
  pivot_longer(cols = CTRL0_1:CTRL0_9,
               names_to = "ctrl0",
               values_to = "ctrl0_expr") %>% 
  pivot_longer(cols = DRUG1_1:DRUG1_9,
               names_to = "drug1",
               values_to = "drug1_expr") %>% 
  pivot_longer(cols = DRUG5_1:DRUG5_9,
               names_to = "drug5",
               values_to = "drug5_expr") %>% 
  pivot_longer(cols = c(ctrl0_expr, drug1_expr, drug5_expr),
               names_to = "group",
               values_to = "gene_expression") %>% 
  select(gene, group, gene_expression)

I have used consecutive pivot_longer() commands, but I wonder if there is a more efficient way to generate the table below that I used for MY ggplot. Note the data in MY gggplot and the correct ggplot are different, which I can't explain. The head() of the table I used looks like this: I can post the code I wrote to generate the 'wide' dataframe of the 1st table shown if it's really needed.

Solution

Given sample data like this:

tibble(gene = c("gene1", "gene2"),
   CTRL0_1 = 1:2,
   CTRL0_2 = 3:4,
   DRUG1_1 = 5:6,
   DRUG1_2 = 7:8,
   DRUG5_1 = 9:10) -> test

Here are two variations of syntax for pivoting all your data columns and then splitting them up by group. You'd need a different approach if the data in those columns had different types.

test %>%
  pivot_longer(cols = -gene) %>%
  separate(name, c("group", "obs"), convert = TRUE) %>%
  pivot_wider(names_from = group, values_from = value)

test %>%
  pivot_longer(cols = -gene, names_to = c("group", "obs"), names_sep = "_") %>%
  pivot_wider(names_from = group, values_from = value)