I have been trying to figure out a more efficient way to summarize rowMeans for various samples belonging to the same group. I couldn't find a good post to what I would think would be a common problem.
I'd like to have something more automated using a metadata file but can't figure it out. Any guidance would be greatly appreciated!
file
looks like this:
Target_Name | Target_position | Sample_1 | Sample_2 | Sample_3 | Sample_4 |
---|---|---|---|---|---|
Target-1 | 1 | 100 | 105 | 54 | 55 |
Target-1 | 2 | 120 | 125 | 57 | 58 |
metadata
looks like this:
group | sample |
---|---|
Control | Sample_1 |
Control | Sample_2 |
Test | Sample_3 |
Test | Sample_4 |
My Expected output looks like this:
Target_Name | Target_position | Sample_1 | Sample_2 | Sample_3 | Sample_4 | Control_Mean | Test_Mean |
---|---|---|---|---|---|---|---|
Target-1 | 1 | 100 | 105 | 54 | 55 | 102.5 | 54.5 |
Target-1 | 2 | 120 | 125 | 57 | 58 | 122.5 | 57.5 |
I am able to generate the expected output using
file$Control_Mean <- rowMeans(file[ , c(3,4)], na.rm=FALSE)
file$Test_Mean <- rowMeans(file[ , c(5,6)], na.rm=FALSE)
But I was wondering if there is a more automated approach, as the group names in my metadata file will be different all of the time and not just called Control or Test. Any suggestions? Thanks!
Here is a solution using dplyr
and tidyr
Packaged
library(dplyr)
library(tidyr)
Input Data
# Data
file <- tibble::tribble(
~Target_Name, ~Target_position, ~Sample_1, ~Sample_2, ~Sample_3, ~Sample_4,
"Target-1", 1L, 100L, 105L, 54L, 55L,
"Target-1", 2L, 120L, 125L, 57L, 58L
)
metadata <- tibble::tribble(
~group, ~sample,
"Control", "Sample_1",
"Control", "Sample_2",
"Test", "Sample_3",
"Test", "Sample_4"
)
Solution
file %>%
pivot_longer(Sample_1:Sample_4, names_to = "sample", values_to = "value") %>%
left_join(metadata) %>%
group_by(Target_position, group) %>%
summarise(mean = mean(value)) %>%
ungroup() %>%
pivot_wider(id_cols = Target_position, names_from = "group", values_from = "mean",
names_prefix = "Mean_") %>%
left_join(file) %>%
select(Target_Name, Target_position, Sample_1:Sample_4, starts_with("Mean"))
Output
## A tibble: 2 × 8
# Target_Name Target_position Sample_1 Sample_2 Sample_3 Sample_4 Mean_Control Mean_Test
# <chr> <int> <int> <int> <int> <int> <dbl> <dbl>
#1 Target-1 1 100 105 54 55 102.5 54.5
#
#2 Target-1 2 120 125 57 58 122.5 57.5
Created on 2022-12-08 with reprex v2.0.2