I just started coding in R and I have a question about applying Chi-square test to a dataset by 2 columns at a time.
I would like to do a paired analysis (Tumor and Normal sample come from the same patient, so Primary Tumor 1 and Normal Tissue 1 comes from the same patient). I would like to see differences in the distribution between tumour and normal sample from the same patient and apply to all 50 patients.
I tried Chi-square goodness of fit previously, with expected probability I calculated from taking average from all normal samples.
The code I used is:
apply(mydata, 2, chisq.test, p=myprobability)
This time, I want to conduct Pearson's Chi-square test (not goodness of fit) to tumour and its matched normal tissue.
So, I would like to run Chi-square test by two columns: Primary Tumor 1 + Normal 1 ... Then next, Primary Tumor 2 + Normal 2
and get a table of Chi-square statistics and p-values. (In this case, I would have to use adjusted p-values right? because I ran it on 50 sets of samples?)
As a reproductible example...
mydata <-
structure(list(Tumor1 = c(17, 28, 80, 63, 20,
10), Normal1 = c(18, 27, 89, 62, 24,
11), Tumor2 = c(25, 40, 80, 65, 23,
11), Normal2 = c(27, 29, 100, 72, 34,
6)), class = "data.frame",
row.names = c("trim3", "trim2", "trim1", "add1", "add2",
"add3"))
head(mydata)
Tumor1 Normal1 Tumor2 Normal2
trim3 17 18 25 27
trim2 28 27 40 29
trim1 80 89 80 100
add1 63 62 65 72
add2 20 24 23 34
add3 10 11 11 6
I tried to use apply function like I did for goodness of fit, but I could not get it to work.
Thank you
You can consider doing a Cochran–Mantel–Haenszel Test which is a test for the independence of two variables with repeated measurements, in your case, different tumour / normal pairs. So using your example, we get an array first:
test = array(unlist(mydata),dim=c(nrow(mydata),2,ncol(mydata)/2))
test
, , 1
[,1] [,2]
[1,] 17 18
[2,] 28 27
[3,] 80 89
[4,] 63 62
[5,] 20 24
[6,] 10 11
, , 2
[,1] [,2]
[1,] 25 27
[2,] 40 29
[3,] 80 100
[4,] 65 72
[5,] 23 34
[6,] 11 6
Then do:
mantelhaen.test(test)
Cochran-Mantel-Haenszel test
data: test
Cochran-Mantel-Haenszel M^2 = 5.0277, df = 5, p-value = 0.4125
Of course you can test each sample pair individually:
library(broom)
# assign groups to columns
grps = rep(1:(ncol(mydata)/2),each=2)
result = do.call(rbind,lapply(unique(grps),function(i)tidy(chisq.test(mydata[,grps==i]))))
result
# A tibble: 2 x 4
statistic p.value parameter method
<dbl> <dbl> <int> <chr>
1 0.569 0.989 5 Pearson's Chi-squared test
2 6.89 0.229 5 Pearson's Chi-squared test