Search code examples
rnamissing-datachi-squared

Omit Na values in a Chi Squared test


I needed a function for get the p-values of multiple Chi-Square tests in a Matrix Looking for, I found this code:

chisqmatrix <- function(x) {
  names = colnames(x);  num = length(names)
  m = matrix(nrow=num,ncol=num,dimnames=list(names,names))
  for (i in 1:(num-1)) {
    for (j in (i+1):num) {
      m[i,j] = chisq.test(x[,i],x[,j],)$p.value
    }
  }
  return (m)
}
mat = chisqmatrix(DATAFRAME)
mat

And works perfectly!

but the problem is that I need that this function omit the NA values.

I can't just omit the NA values in all the dataframe, I need them to be omitted for each pair in the function

So when x[,i] select the columns How can I implement that for only take the values that are not null. I tried things like !="NA" but not correctly way.

Thanks you!


Solution

  • You really need to provide reproducible data. As documented on the manual page, chisq.test removes missing values before computing:

    set.seed(42)
    x <- matrix(sample(c(LETTERS[1:3], NA), 100, replace=TRUE), 20, 5)
    x <- data.frame(x)
    head(x)
    #     X1   X2   X3 X4   X5
    # 1    A    C    C  A <NA>
    # 2    A    C    C  B    A
    # 3    A    A    B  B    B
    # 4    A    A    B  B    A
    # 5    B    C <NA>  B    A
    # 6 <NA> <NA> <NA>  B <NA>
    x.chi <- chisq.test(x[, 1], x[, 2])
    # Warning message:
    # In chisq.test(x[, 1], x[, 2]) : Chi-squared approximation may be incorrect
    x.chi$observed
    #       x[, 2]
    # x[, 1] A B C
    #      A 3 1 3
    #      B 2 1 2
    sum(x.chi$observed) # How many observations in x.chi?
    [1] 12
    nrow(na.omit(x[, 1:2])) $ How many rows in x after removing NAs?
    [1] 12
    

    Your function will do exactly what you want it to do.