I want to take a sample per group, allthewhile avoiding that any participant appears twice across the samples (I need this for a between-subjects ANOVA). I have a dataframe in which some participants (not all) appear twice, each time in a different group, i.e. Peter can appear in group v1=A and v2=1 but theoretically also in group v1=B and v2=3. A group is defined by the two variables v1 and v2, so according to the below code, there are 8 groups.
Now, I want to avoid the double appearance of any participant in the data by taking samples per group and randomly eliminating one observation from any participant, allthewhile maintaining similarly sized samples. I constructed the following ugly code to showcase my problem.
How do I get the last step done, so that no participant appears twice across the samples and I only have unique cases across all samples?
df1 < - data.frame(ID=c("peter","peter","chris","john","george","george","norman","josef","jan","jan","richard","richard","paul","christian","felix","felix","nick","julius","julius","moritz"),
v1=rep(c("A","B"),10),
v2=rep(c(1:4),5))
library(dplyr)
df2 <- df1 %>% group_by(v1,v2) %>% sample_n(2)
You could first take a sample of size 1 as per 'ID', then group_by
'v1' and 'v2' and take another sample of size 2.
library(dplyr)
set.seed(1)
df2 <- df1 %>%
group_by(ID) %>%
sample_n(1) %>%
group_by(v1, v2) %>%
sample_n(2)
df2
# Groups: v1, v2 [4]
# ID v1 v2
# <fct> <fct> <int>
# 1 paul A 1
# 2 jan A 1
# 3 norman A 3
# 4 richard A 3
# 5 george B 2
# 6 peter B 2
# 7 moritz B 4
# 8 felix B 4