Search code examples
rhypothesis-test

Chi-Squared Test Returning Infinity


I fitted some data to a Poisson distribution, the data is as follows:

observed <- c(290, 630, 873, 853, 618, 310, 138, 54, 21, 9, 4)
estimated_prob_mass <- c(0.064, 0.176, 0.242, 0.222, 0.152, 0.084, 0.038, 0.015, 0.005, 0.002, 0.000)

Visually, the scaled distribution fits the data really well. I used the ChiSq goodness of fit test to check the data and got the following result:

chisq.test(observed, p=estimated_prob_mass)

#Warning message in chisq.test(observed, p = estimated_prob_mass):
#"Chi-squared approximation may be incorrect"

#Chi-squared test for given probabilities

#data:  observed
#X-squared = Inf, df = 10, p-value < 2.2e-16

Why would I get an infinite ChiSq value and a near zero p-value in this case?


Solution

  • You provided an estimated probability of 0 for one of the cells. If you get an non-zero value for that cell then the test will reject because you got an impossible result with your probabilities. Changing the probability vector so that the last probability is .001 and then normalizing the rest of the vector gives a more sensible result.

    > observed <- c(290, 630, 873, 853, 618, 310, 138, 54, 21, 9, 4)
    > estimated_prob_mass <- c(0.064, 0.176, 0.242, 0.222, 0.152, 0.084, 0.038, 0.015, 0.005, 0.002, 0.000)
    > e <- estimated_prob_mass
    > e[11] <- .001
    > e <- e/sum(e) 
    > 
    > # Let's compare the probabilities provided versus the new ones
    > estimated_prob_mass
     [1] 0.064 0.176 0.242 0.222 0.152 0.084 0.038 0.015 0.005 0.002 0.000
    > round(e, 3)
     [1] 0.064 0.176 0.242 0.222 0.152 0.084 0.038 0.015 0.005 0.002 0.001
    > 
    > chisq.test(observed, p = e)
    
            Chi-squared test for given probabilities
    
    data:  observed
    X-squared = 17.748, df = 10, p-value = 0.05936
    
    Warning message:
    In chisq.test(observed, p = e) : Chi-squared approximation may be incorrect
    

    The main take-away is that your probability vector is either completely accurate in which case you absolutely 100% should reject the null - or it doesn't actually make sense. And if you think it should make sense and aren't understanding the result you got then you should rethink your test and consult with a statistician. Playing around ever so slightly with the probability vector can shift the results between rejecting the null with impunity to not rejecting it in the slightest. So I urge you to consult a statistician if this doesn't make sense.