Run the code below to generate the dataset of this question.
dt <- data.table(
classA = rep(c("C", "D"), 8),
classB = rep(c("A", "B"), each = 8),
priceA = c(0.5, 0.6, 1.1, 1.1, 0.6, 0.6, -0.2, 1.3, -0.4, 3, -0.4, 2.3, -0.2, 1.8, 0.4, 0.4)
)
We apply permutation testing to test if an association exists between the variable classA and the variable price. We provide the following code:
T_obs <- dt[, mean(priceA), by = classA][, diff(V1)]
T_star <- sapply(1:1000, function(i){
dt[, price_sample := sample(priceA)]
dt[, mean(price_sample), by = classA][, diff(V1)]
})
p_val_right<-(sum(T_star >= T_obs)+ 1) / (1000 + 1)
p_val_left<- (sum(T_star <= T_obs)+ 1) / (1000 + 1)
pval<-2*min(pval_l, p_val_r, na.rm = TRUE)
How can I solve this question? Online I can only find more advanced code but we need to solve it simpler.
I tried using this code:
T_obs <- dt[classB=="C", mean(price), by = classA][, diff(V1)]
T_star <- sapply(1:1000, function(i){
dt[classB=="C", price_sampled := sample(price)]
dt[classB=="C", mean(price_sampled), by = classA][, diff(V1)]
})
p_val_r<-(sum(T_star >= T_obs)+ 1) / (1000 + 1)
pval_l<- (sum(T_star <= T_obs)+ 1) / (1000 + 1)
pval_groupC<-2*min(pval_l, p_val_r, na.rm = TRUE)
T_obs <- dt[classB=="D", mean(price), by = classA][, diff(V1)]
T_star <- sapply(1:1000, function(i){
dt[classB=="D", price_sampled := sample(price)]
dt[classB=="D", mean(price_sampled), by = classA][, diff(V1)]
})
p_val_r<-(sum(T_star >= T_obs)+ 1) / (1000 + 1)
pval_l<- (sum(T_star <= T_obs)+ 1) / (1000 + 1)
pvalgroupD<-2*min(pval_l, p_val_r, na.rm = TRUE)
pval<- (pvalgroupD+pval_groupC)/2
pval
Is this is correct?