Search code examples
rggplot2statisticsmodel-fittinggamma-distribution

How to draw fitted graph and actual graph of gamma distribution in one plot?


Step.1. Load the package needed.

library(ggplot2)
library(MASS)

Step.2. Generate 10,000 numbers fitted to gamma distribution.

x <- round(rgamma(100000,shape = 2,rate = 0.2),1)
x <- x[which(x>0)]

Step.3. Draw the pdf(probability density function), supposed we don't know which distribution x fitted to.

t1 <- as.data.frame(table(x))
names(t1) <- c("x","y")
t1 <- transform(t1,x=as.numeric(as.character(x)))
t1$y <- t1$y/sum(t1[,2])
ggplot() + geom_point(data = t1,aes(x = x,y = y)) + theme_classic()

enter image description here

Step.4. From the graph, we can learn that the distribution of x is quite like gamma distribution, so we use fitdistr() in package MASS to get the parameters of shape and rate of gamma distribution.

fitdistr(x,"gamma") 
##       output 
##       shape           rate    
##   2.0108224880   0.2011198260 
##  (0.0083543575) (0.0009483429)

Step.5. Draw the actual point(black dot) and fitted graph(red line) in the same plot, and here is the question, please look the plot first.

ggplot() + geom_point(data = t1,aes(x = x,y = y)) +     
geom_line(aes(x=t1[,1],y=dgamma(t1[,1],2,0.2)),color="red") + 
theme_classic()

enter image description here

Question 1: The real parameters are shape=2, rate=0.2, and the parameters I use the function fitdistr() to get are shape=2.01, rate=0.20. These two are nearly the same, but why the fitted graph don't fit the actual point well, there must be something wrong in the fitted graph, or the way I draw the fitted graph and actual points is totally wrong, what should I do?

Question 2: After I get the parameter of the model I establish, in which way I evaluate the model, something like RSS(residual square sum) for linear model, or the p-value of shapiro.test() , ks.test() and other test? I am poor in statistical knowledge, please help me out of both the question, thank you!(ps: I have search in the Google and stackoverflow many times, but it does not work, so do not vote this question is useless,Thx!)


Solution

  • For some reason, your fitted line seems to be exactly 10 times to high:

    ggplot() + geom_point(data = t1,aes(x = x,y = y)) +     
      geom_line(aes(x=t1[,1],y=(dgamma(t1[,1],2,0.2))/10),color="red") + 
      theme_classic()
    

    gives a perfect fit: enter image description here

    As jbaums stated this is caused by the density per dx that is 0.1 in this case.