I am carrying out some exercises from a very good online R/bioinformatics course. To this end I am wrangling with data in the form of a 'SummarizedExperiment' object from a Bioconductor package of the same name. The rows consist of gene names and gene expression values; the columns consist of 9 ctrl (control) samples, 9 'drug1' treated samples and 9 'drug5' treated samples. Here is what the table looks like: The task is to regroup data in this dataframe so that CTRL0_1 - CTRL0_9 are placed in a single column, named 'CTRL0'. In the same fashion, new 'DRUG1' and 'DRUG5' named columns are needed consisting of gene expression for each gene in the columns DRUG1_1 - DRUG1_9 and DRUG5_1 - DRUG5_9, respectively. Data are derived from the final question on this webpage: https://uclouvain-cbio.github.io/WSBIM1207/sec-bioinfo.html The task is to generate a ggplot like this: Instead, with my inelegant code I get this: To generate MY plot, I used this code:
table_gene_gp_expr <- geneCol_filter_assaySeDf %>%
pivot_longer(cols = CTRL0_1:CTRL0_9,
names_to = "ctrl0",
values_to = "ctrl0_expr") %>%
pivot_longer(cols = DRUG1_1:DRUG1_9,
names_to = "drug1",
values_to = "drug1_expr") %>%
pivot_longer(cols = DRUG5_1:DRUG5_9,
names_to = "drug5",
values_to = "drug5_expr") %>%
pivot_longer(cols = c(ctrl0_expr, drug1_expr, drug5_expr),
names_to = "group",
values_to = "gene_expression") %>%
select(gene, group, gene_expression)
I have used consecutive pivot_longer()
commands, but I wonder if there is a more efficient way to generate the table below that I used for MY ggplot. Note the data in MY gggplot and the correct ggplot are different, which I can't explain. The head()
of the table I used looks like this: I can post the code I wrote to generate the 'wide' dataframe of the 1st table shown if it's really needed.
Given sample data like this:
tibble(gene = c("gene1", "gene2"),
CTRL0_1 = 1:2,
CTRL0_2 = 3:4,
DRUG1_1 = 5:6,
DRUG1_2 = 7:8,
DRUG5_1 = 9:10) -> test
Here are two variations of syntax for pivoting all your data columns and then splitting them up by group. You'd need a different approach if the data in those columns had different types.
test %>%
pivot_longer(cols = -gene) %>%
separate(name, c("group", "obs"), convert = TRUE) %>%
pivot_wider(names_from = group, values_from = value)
test %>%
pivot_longer(cols = -gene, names_to = c("group", "obs"), names_sep = "_") %>%
pivot_wider(names_from = group, values_from = value)