Combining sample() and group_by() from tidyvese

Below, I'm trying to randomly select the rows of a group value in each study in my data, how?

Well, we first group_by(study), then decide to pick one of the group's rows in each study based on:

group_row <- sapply(1:length(unique(data$study)), 
                     function(i)sample(0:2, 1, replace = TRUE))

For each study in group_by(study):

if group_row was 1, select group == 1 rows of that study.

if group_row was 2, select group == 2 rows of that study.

if group_row was 0, select ALL rows of that study.

I have tried the following without success?

library(tidyverse)

(data <- expand_grid(study=1:3,group=1:2,outcome=c("A","B"), time=0:1) %>%
    as.data.frame())


lapply(1:2, function(i){
data %>% dplyr::group_by(group) %>% 
    filter(group == if(group_row[i] ==0) unique(data$group) else group_row[i]) %>% 
  dplyr::ungroup() %>% arrange(study,group,outcome,time)
})

Solution

You can write a function to select a row for each study and apply the function by group.

library(dplyr)

return_rows <- function(x) {
  n <- sample(0:2, 1)
  #If n = 0 select all rows else 
  #select row for corresponding group
  if(n == 0) TRUE else x == n
}


data %>%
  group_by(study) %>%
  filter(return_rows(group)) %>%
  ungroup()