Search code examples
rzerodata-fittingweibull

Data series, how can i fit a distribution in R?


I have some problems with a Data series because it has some zero values so there are some distributions who don't fit it. I've tried with the function fitdistand fitdistr but no one works. There are my data:

 Precp
28
8
0
107
339
231
308
226
302
333
163
92
48
17
101
327
424
338
559
184
238
371
413
261
12
24
103
137
300
446
94
313
402
245
147
70
8
5
59
109,2
493,6
374,5
399,3
330,5
183,8
341,1
91
127,5
15
69
165,8
337,9
255
309,3
352,7
437,5
420,4
295,6
141,7
3,4
16,2
58,9
55,5
203,1
235
300
264,5
320,5
401,5
500,2
149
100
12
110
53,5
70
661,5
86
499,6
154,5
367
142
177
435
64
287,3
210,3
324,7
288,8
0
0
0
0
0
0
0
76,2
53
59,6
176,5
263,1
285,3
423,9
387
367,9
243,9
94
38
50
31
177
180
264
326
204
463,4
255,6
336,4
436,8
139
5
98
180
275,8
415,2
351,7
369
324
249
296
267
102
4
51
123
358,2
394
470
260
288
502
322
597
216
18,9
26
98
311,5
237,5
278
296
387,5
274,2
458,1
0
0
99,6
69,3
152,7
189
319,8
217,9
280,2
250,1
275,2
275
117,5
0

When I try to fit a distribution, for example Weibull this is the message that appears:

> fw=fitdist(Precp,"weibull")
[1] "Error in optim(par = vstart, fn = fnobj, fix.arg = fix.arg, obs = data,  : \n  non-finite value supplied by optim\n"
attr(,"class")
[1] "try-error"
attr(,"condition")
<simpleError in optim(par = vstart, fn = fnobj, fix.arg = fix.arg, obs = data,     ddistnam = ddistname, hessian = TRUE, method = meth, lower = lower,     upper = upper, ...): non-finite value supplied by optim>
Error in fitdist(Precp, "weibull") : 
  the function mle failed to estimate the parameters, 
                with the error code 100

And the same thing happens when I try to use the gamma distribution. Any idea what happened there?


Solution

  • If you want to fit an extreme value distribution such as the Weibull distribution, you can try the evd package:

    library(evd)
    > fit <- fgev(dat$Precp)
    > fit
    
    Call: fgev(x = dat$Precp) 
    Deviance: 2159.363 
    
    Estimates
         loc     scale     shape  
    151.9567  137.6544   -0.1518  
    
    Standard Errors
         loc     scale     shape  
    12.41071   9.24535   0.07124  
    
    Optimization Information
      Convergence: successful 
      Function Evaluations: 27 
      Gradient Evaluations: 15 
    

    If you are not interested in a parametric distribution, you may consider the density function, which computes a kernel density.

    Since your data seems to contain many small values, you may consider mixing two distributions. The flexmix package can do that for you.

    hist(dat$Precp,prob=T,col="gray", ylim=c(0,0.0042), breaks=seq(0,700, by=50)
        xlab="", ylab="", main="")
    legend("topright", 
        legend=c("density", "fgev", "flexmix"), 
        fill=c("darkgreen", "blue", "darkred")
    )
    xval <- seq(from=0, to=max(dat$Precp), length.out=200)
    
    # density
    fit1 <- density(dat$Precp)
    lines(fit1, col="darkgreen", lwd=2)
    
    # generalized extreme value distribution
    fit2 <- fgev(dat$Precp)
    param2 <- fit2$estimate
    loc <- param2[["loc"]]
    scal <- param2[["scale"]]
    shape <- param2[["shape"]]
    lines(xval, dgev(xval, loc=loc, scale=scal, shape=shape), col="blue", lwd=2)
    
    # mixture of two Gamma distributions
    # http://r.789695.n4.nabble.com/Gamma-mixture-models-with-flexmix-tt3328929.html#none
    fit3 <- flexmix(Precp~1, data=subset(dat, Precp>0), k=2, 
        model = list(FLXMRglm(family = "Gamma"), FLXMRglm(family = "Gamma"))
    )
    param3 <- parameters(fit3)[[1]] # don't know why this is a list
    interc <- param3[1,]
    shape <- param3[2,]
    lambda <- prior(fit3)
    yval <- lambda[[1]]*dgamma(xval, shape=shape[[1]], rate=interc[[1]]*shape[[1]]) + 
            lambda[[2]]*dgamma(xval, shape=shape[[2]], rate=interc[[2]]*shape[[2]])
    lines(xval, yval, col="darkred", lwd=2)
    

    device output