Search code examples
ranovapoisson

Test for Poisson residuals in the analysis of variance model


I try to find any way for test Poisson residuals like normals in aov(). In my hypothetical example:

# For normal distribution
x <- rep(seq(from=10, to=50, by=0.5),6)
y1 <- rnorm(length(x), mean=10, sd=1.5)

#Normality test in aov residuals
y1.av<-aov(y1 ~ x)
shapiro.test(y1.av$res)
#   Shapiro-Wilk normality test
#
#data:  y1.av$res
#W = 0.99782, p-value = 0.7885

Sounds silly, OK!!

Now, I'll like to make a same approche but for Poisson distribution:

# For Poisson distribution
x <- rep(seq(from=10, to=50, by=0.5),6)
y2 <- rpois(x, lambda=10)

#Normality test in aov residuals
y2.av<-aov(y2 ~ x)
poisson.test(y2.av$res)
Error in poisson.test(y2.av$res) : 
  'x' must be finite, nonnegative, and integer

There is any stat approach for make this?

Thanks!


Solution

  • You could analyse your data below a counting context. Discrete data, such as variables of Poisson nature, can be analysed based on observed frequencies. You can formulate hypothesis testing for this task. Being your data y you can contrast the null hypothesis that y follows a Poisson distribution with some parameter lambda against the alternative hypothesis that y does not come from the Poisson distribution. Let's sketch the test with you data:

    #Data
    set.seed(123)
    # For Poisson distribution
    x <- rep(seq(from=10, to=50, by=0.5),6)
    y2 <- rpois(x, lambda=10)
    

    Now we obtain the counts, which are elemental for the test:

    #Values
    df <- as.data.frame(table(y2),stringsAsFactors = F)
    df$y2 <- as.integer(df$y2)
    

    After that we must separate the observed values O and its groups or categories classes. Both elements constitute the y variable:

    #Observed values
    O <- df$Freq
    #Groups
    classes <- df$y2
    

    As we are testing a Poisson distribution, we must compute the lambda parameter. This can be obtained with Maximum Likelihood Estimation (MLE). The MLE for Poisson is the mean (considering we have counts and groups in order to determine this value), so we compute it with next code:

    #MLE
    meanval <- sum(O*classes)/sum(O)
    

    Now, we have to get the probabilities of each class:

    #Probs
    prob <- dpois(classes,meanval)
    

    Poisson distribution can go to infinite values, so we must compute the probability for the values that can be greater than our last group in order to have probabilities that sum to one:

    prhs <- 1-sum(prob)
    

    This probability can be easily added to the last value of our group in order to transform to account for values greater or equal to it (For example, instead of only having the probability that y equals to 20 we can have the probability that y is greater or equal to 20):

    #Add probability
    prob[length(prob)]<-prob[length(prob)]+prhs 
    

    With this we can conduct a goodness of fit test using chisq.test() function in R. It requires the observed values O and the probabilities prob that we have computed. Just a reminder that this test uses to set wrong degrees of freedom, so we can correct it by the formulation of the test that uses k-q-1 degrees. Where k is the number of groups and q is the number of parameters computed (we have computed one parameter with MLE). Next the test:

    chisq.test(O,p=prob)
    

    The output:

    Chi-squared test for given probabilities
    
    data:  O
    X-squared = 7.6692, df = 17, p-value = 0.9731
    

    The key value from the test is the X-squared value which is the test statistic. We can reuse the value to obtain the real p-value (In our example, we have k=18 and minus 2, the degrees of freedom are 16).

    The p.value can be obtained with next code:

    p.value <- 1-pchisq(7.6692, 16)
    

    The output:

    [1] 0.9581098
    

    As this value is not greater that known significance levels we do not reject the null hypothesis and we can affirm that y comes from a Poisson distribution.