First, here is some toy data:
df <- data.frame(
"stim" = c("face", "object", "pareidolia", "face", "face", "object", "pareidolia", "object"),
"RT" = c(23, 24, 25, 26, 27, 22, 25, 23),
"Opac" = c(70, 60, 80, 65, 60, 61, 59, 70)
)
I want to ensure that there are equal numbers of each stim variable in the dataset. I am using the following code to attempt this:
library(dplyr)
newdf <- df %>%
mutate(mn = min(table(stim))) %>%
group_by(stim) %>%
sample_n(mn[1]) %>%
ungroup()
This works almost perfectly, except that it reorders the data. My desired output would look like the following:
stim RT Opac
face 23 70
object 24 60
pareidolia 25 80
face 26 65
object 22 61
pareidolia 25 59
But this code outputs this:
stim RT Opac
face 23 70
face 26 65
object 24 60
object 22 61
pareidolia 25 80
pareidolia 25 59
I realize that this is likely happening because I am using table(), but I'm not sure how else to go about this. Any suggestions would be appreciated.
Also, bonus side question: is there a way to determine (a function, code snippet, etc) the row number where the data is being cut from as part of this process?
You could use a filtering strategy rather than slice_n
df %>%
mutate(mn = min(table(stim))) %>%
filter(sample(seq_along(stim)) <= mn, .by=stim)