Apologies if this has been asked elsewhere / if I am using the wrong terms, I have been trying to search for the correct way to do this but with no success so far.
I have an experimental design with 3 experimental conditions using repeated measures outcomes (each participant completes 4 trials). The data I have currently is in long format (each participant ID is repeated 4 times). I am trying to calculate summary statistics for the demographic variables (age, gender, condition etc.) but I cannot figure out how to, for lack of a better word, collapse/merge the rows for each participant together to get the frequency data and/or summary stats.
Below I have a simulated dataset
require(tidyverse)
require(summarytools)
require(skimr)
require(lme4)
require(wakefield) #to simulate age distribution
require(reshape2)
id <- rep(1:150, each = 4)
age <- rep(age(150, x = 18:21), each = 4)
gender <- rep(c("male", "male", "male", "male", "female", "female","female","female"), each = 25, times = 3)
condition <- rep(c("condition_1", "condition_2", "condition_3"), each = 4, times = 50) #condition
control_1 <- rep(c("order_1", "order_2"), each = 4, length.out = 600) # control variable for counterbalancing
control_2 <- rep(c("group_1", "group_2"), each = 75, length.out = 600) control variable for counterbalancing
test1_trial <- rep(c("trial_1", "trial_2", "trial_3", "trial_4"), each = 1, length.out = 600)
test1_outcome <- rbinom(600, 1, 0.5) # actual data
test2_trial <- rep(c("trial_1", "trial_2", "trial_3", "trial_4"), each = 1, length.out = 600)
test2_outcome <- rbinom(600, 1, 0.5) # actual data
dat <- data.frame(id, age, gender, condition, control_1, control_2, test1_trial, test1_outcome, test2_trial, test2_outcome)
I have tried using group_by like so
dat %>%
group_by(id) %>%
freq(age)
but this gives me each id as a separate group which is obviously not helpful for summary statistics.
I also tried using summarise_all but could not get it to work
dat$id <- as.factor(dat$id)
dat %>%
select(id, age)
group_by(id) %>%
summarise_all(funs(sum))
Error in UseMethod("group_by") : no applicable method for 'group_by' applied to an object of class "c('integer', 'numeric')"
For the summary statistics, I don't care about the actual data (i.e. test1_outcome and test2_outcome), I just want to be able to calculate e.g., the mean age, number of participants per condition etc. Is there a way I can somehow select just the control/demographic variables and collapse them for each participant?
Apologies for the basic question, I do not usually work with repeated measures designs and so am not super familiar with long format data.
If your demographic data don't vary across treatment rounds, you can just run distinct() or unique() by id, similar to what Jon Spring suggested, like this:
dat %>%
distinct(id, age, gender)
You could then collapse by condition to get the summary stats by this or whatever other variable you want along with the count of participants:
dat %>%
distinct(id, age, gender, condition) %>%
group_by(condition, gender) %>%
mutate(n = n()) %>%
summarise_all( .funs = c(mean))