Search code examples
rpermutation

How to compute a permutation test controlled for by a different group?


Run the code below to generate the dataset of this question.

dt  <- data.table(
   classA = rep(c("C", "D"), 8),
   classB = rep(c("A", "B"), each = 8),
   priceA = c(0.5, 0.6, 1.1, 1.1, 0.6, 0.6, -0.2, 1.3, -0.4, 3, -0.4, 2.3, -0.2,     1.8, 0.4, 0.4)
)

We apply permutation testing to test if an association exists between the variable classA and the variable price. We provide the following code:

T_obs <- dt[, mean(priceA), by = classA][, diff(V1)]
T_star <- sapply(1:1000, function(i){
    dt[, price_sample := sample(priceA)]
    dt[, mean(price_sample), by = classA][, diff(V1)]
})
p_val_right<-(sum(T_star >= T_obs)+ 1) / (1000 + 1)
p_val_left<- (sum(T_star <= T_obs)+ 1) / (1000 + 1)
pval<-2*min(pval_l, p_val_r, na.rm = TRUE)
  1. Test if the association could be confounded by classB. To control for groupB, adapt the code. Provide R code that computes T_star and T_obs conditioned on the third variable groupB. As overall test statistics, use the p_value averaged across the groups defined by the variable groupB.

How can I solve this question? Online I can only find more advanced code but we need to solve it simpler.


Solution

  • I tried using this code:

    T_obs <- dt[classB=="C", mean(price), by = classA][, diff(V1)]
    
    T_star <- sapply(1:1000, function(i){
      dt[classB=="C", price_sampled := sample(price)]
      dt[classB=="C", mean(price_sampled), by = classA][, diff(V1)]
    })
    p_val_r<-(sum(T_star >= T_obs)+ 1) / (1000 + 1)
    pval_l<- (sum(T_star <= T_obs)+ 1) / (1000 + 1)
    pval_groupC<-2*min(pval_l, p_val_r, na.rm = TRUE)
    
    
    T_obs <- dt[classB=="D", mean(price), by = classA][, diff(V1)]
    T_star <- sapply(1:1000, function(i){
      dt[classB=="D", price_sampled := sample(price)]
      dt[classB=="D", mean(price_sampled), by = classA][, diff(V1)]
    })
    p_val_r<-(sum(T_star >= T_obs)+ 1) / (1000 + 1)
    pval_l<- (sum(T_star <= T_obs)+ 1) / (1000 + 1)
    pvalgroupD<-2*min(pval_l, p_val_r, na.rm = TRUE)
    
    pval<- (pvalgroupD+pval_groupC)/2
    pval
    

    Is this is correct?