Search code examples
rstatisticsbioinformaticschi-squared

R: applying Pearson's Chi-square test by two columns


I just started coding in R and I have a question about applying Chi-square test to a dataset by 2 columns at a time.

I would like to do a paired analysis (Tumor and Normal sample come from the same patient, so Primary Tumor 1 and Normal Tissue 1 comes from the same patient). I would like to see differences in the distribution between tumour and normal sample from the same patient and apply to all 50 patients.

I tried Chi-square goodness of fit previously, with expected probability I calculated from taking average from all normal samples.

The code I used is:

apply(mydata, 2, chisq.test, p=myprobability)

This time, I want to conduct Pearson's Chi-square test (not goodness of fit) to tumour and its matched normal tissue.

So, I would like to run Chi-square test by two columns: Primary Tumor 1 + Normal 1 ... Then next, Primary Tumor 2 + Normal 2

and get a table of Chi-square statistics and p-values. (In this case, I would have to use adjusted p-values right? because I ran it on 50 sets of samples?)

My data looks like this: enter image description here

As a reproductible example...

mydata <-
structure(list(Tumor1 = c(17, 28, 80, 63, 20, 
10), Normal1 = c(18, 27, 89, 62, 24, 
11), Tumor2 = c(25, 40, 80, 65, 23, 
11), Normal2 = c(27, 29, 100, 72, 34, 
6)), class = "data.frame", 
row.names = c("trim3", "trim2", "trim1", "add1", "add2", 
"add3"))

head(mydata)

      Tumor1 Normal1 Tumor2 Normal2
trim3     17      18     25      27
trim2     28      27     40      29
trim1     80      89     80     100
add1      63      62     65      72
add2      20      24     23      34
add3      10      11     11       6

I tried to use apply function like I did for goodness of fit, but I could not get it to work.

Thank you


Solution

  • You can consider doing a Cochran–Mantel–Haenszel Test which is a test for the independence of two variables with repeated measurements, in your case, different tumour / normal pairs. So using your example, we get an array first:

    test = array(unlist(mydata),dim=c(nrow(mydata),2,ncol(mydata)/2))
    test
    , , 1
    
         [,1] [,2]
    [1,]   17   18
    [2,]   28   27
    [3,]   80   89
    [4,]   63   62
    [5,]   20   24
    [6,]   10   11
    
    , , 2
    
         [,1] [,2]
    [1,]   25   27
    [2,]   40   29
    [3,]   80  100
    [4,]   65   72
    [5,]   23   34
    [6,]   11    6
    

    Then do:

    mantelhaen.test(test)
    
        Cochran-Mantel-Haenszel test
    
    data:  test
    Cochran-Mantel-Haenszel M^2 = 5.0277, df = 5, p-value = 0.4125
    

    Of course you can test each sample pair individually:

    library(broom)
    # assign groups to columns
    grps = rep(1:(ncol(mydata)/2),each=2)
    result = do.call(rbind,lapply(unique(grps),function(i)tidy(chisq.test(mydata[,grps==i]))))
    result
    
    # A tibble: 2 x 4
      statistic p.value parameter method                    
          <dbl>   <dbl>     <int> <chr>                     
    1     0.569   0.989         5 Pearson's Chi-squared test
    2     6.89    0.229         5 Pearson's Chi-squared test