Search code examples
rdataframedplyrtidyversesample

Combining sample() and group_by() from tidyvese


Below, I'm trying to randomly select the rows of a group value in each study in my data, how?

Well, we first group_by(study), then decide to pick one of the group's rows in each study based on:

group_row <- sapply(1:length(unique(data$study)), 
                     function(i)sample(0:2, 1, replace = TRUE))

For each study in group_by(study):

if group_row was 1, select group == 1 rows of that study.

if group_row was 2, select group == 2 rows of that study.

if group_row was 0, select ALL rows of that study.

I have tried the following without success?

library(tidyverse)

(data <- expand_grid(study=1:3,group=1:2,outcome=c("A","B"), time=0:1) %>%
    as.data.frame())


lapply(1:2, function(i){
data %>% dplyr::group_by(group) %>% 
    filter(group == if(group_row[i] ==0) unique(data$group) else group_row[i]) %>% 
  dplyr::ungroup() %>% arrange(study,group,outcome,time)
})

Solution

  • You can write a function to select a row for each study and apply the function by group.

    library(dplyr)
    
    return_rows <- function(x) {
      n <- sample(0:2, 1)
      #If n = 0 select all rows else 
      #select row for corresponding group
      if(n == 0) TRUE else x == n
    }
    
    
    data %>%
      group_by(study) %>%
      filter(return_rows(group)) %>%
      ungroup()