I am trying to calculate the means of four groups. My data frame looks something similar to the following:
Sex <- c("F", "F", "M", "M", "F")
Phenotype <- c(Control, Experimental, Experimental, Control, Control)
MOp_Amygdala <- c("10", "15", "2", "6", "8")
MOp_Thalamus <- c("19", "12", "4", "4", "6")
MOp_Cerebellum <- c("34", "45", "67", "78", "99")
MOq_Cortex <- c("2", "5", "6", "17", "2")
MOq_Striatum <- c("100", "101", "102", "106", "200")
df <- data.frame(Sex, Phenotype, MOp_Amygdala, MOp_Thalamus, MOp_Cerebellum, MOq_Cortex, MOq_Striatum)
I want to find the mean of the amygdala, Thalamus, and Cerebellum of my four groups: M-Control, M-Experimental, F-Control and F-Experimental.
This I what I have tried so far:
Q1 <- data %>%
group_by(Sex, Phenotype)%>%
select(starts_with("MOp")) %>%
rowwise() %>%
mutate(Group_Means = mean(c(MOp_Amygdala, MOp_Thalamus, MOp_Cerebellum))) #redundant
My main issue with the output is that it seems like group_by is not working. Instead of 4 observations (M-Control, M-Experimental, F-Control, and F-Experimental) I end up with 5 observations, one for each sample.
You can calculate the mean
of each group by summarizing the data frame.
I've modified your input data:
Sex <- rep(c("F", "F", "M", "M", "F"), 5)
Phenotype <- rep(c('Control', 'Experimental', 'Experimental', 'Control', 'Control'), 5)
MOp_Amygdala <- c(10, 15, 2, 6, 8, sample(seq(1,20,1), 20, replace = TRUE))
MOp_Thalamus <- c(19, 12, 4, 4, 6, sample(seq(1,20,1), 20, replace = TRUE))
MOp_Cerebellum <- c(34, 45, 67, 78, 99, sample(seq(20,100,1), 20, replace = TRUE))
MOq_Cortex <- c(2, 5, 6, 17, 2, sample(seq(1,20,1), 20, replace = TRUE))
MOq_Striatum <- c(100, 101, 102, 106, 200, sample(seq(100,200,1), 20, replace = TRUE))
df <- data.frame(Sex, Phenotype, MOp_Amygdala, MOp_Thalamus, MOp_Cerebellum, MOq_Cortex, MOq_Striatum)
library(tidyverse)
glimpse(df)
#> Rows: 25
#> Columns: 7
#> $ Sex <chr> "F", "F", "M", "M", "F", "F", "F", "M", "M", "F", "F", …
#> $ Phenotype <chr> "Control", "Experimental", "Experimental", "Control", "…
#> $ MOp_Amygdala <dbl> 10, 15, 2, 6, 8, 16, 6, 14, 3, 2, 16, 20, 15, 15, 2, 8,…
#> $ MOp_Thalamus <dbl> 19, 12, 4, 4, 6, 14, 9, 12, 2, 9, 17, 17, 4, 7, 16, 9, …
#> $ MOp_Cerebellum <dbl> 34, 45, 67, 78, 99, 73, 21, 94, 30, 75, 54, 80, 48, 27,…
#> $ MOq_Cortex <dbl> 2, 5, 6, 17, 2, 6, 8, 5, 10, 4, 7, 14, 8, 1, 12, 11, 12…
#> $ MOq_Striatum <dbl> 100, 101, 102, 106, 200, 192, 193, 162, 121, 198, 109, …
Here's one way to calculate averages for each group for these three columns:
df %>%
summarise(across(starts_with('MOp'), mean),
.by = c(Sex, Phenotype))
#> Sex Phenotype MOp_Amygdala MOp_Thalamus MOp_Cerebellum
#> 1 F Control 7.9 12.3 61.5
#> 2 F Experimental 14.6 13.2 52.6
#> 3 M Experimental 12.4 10.8 73.6
#> 4 M Control 7.2 5.4 50.4
Created on 2023-07-24 with reprex v2.0.2