Search code examples
rdistributionestimationprobability-densityweibull

Fit distribution to given frequency values in R


I have frequency values changing with the time (x axis units), as presented on the picture below. After some normalization these values may be seen as data points of a density function for some distribution.

Q: Assuming that these frequency points are from Weibull distribution T, how can I fit best Weibull density function to the points so as to infer the distribution T parameters from it?

sample <- c(7787,3056,2359,1759,1819,1189,1077,1080,985,622,648,518,
            611,1037,727,489,432,371,1125,69,595,624)

plot(1:length(sample), sample, type = "l")
points(1:length(sample), sample)

enter image description here

Update. To prevent from being misunderstood, I would like to add little more explanation. By saying I have frequency values changing with the time (x axis units) I mean I have data which says that I have:

  • 7787 realizations of value 1
  • 3056 realizations of value 2
  • 2359 realizations of value 3 ... etc.

Some way towards my goal (incorrect one, as I think) would be to create a set of these realizations:

# Loop to simulate values 
set.values <- c()
for(i in 1:length(sample)){
  set.values <<- c(set.values, rep(i, times = sample[i]))
}

hist(set.values)
lines(1:length(sample), sample)
points(1:length(sample), sample)

enter image description here

and use fitdistr on the set.values:

f2 <- fitdistr(set.values, 'weibull')
f2

Why I think it is incorrect way and why I am looking for a better solution in R?

  • in the distribution fitting approach presented above it is assumed that set.values is a complete set of my realisations from the distribution T

  • in my original question I know the points from the first part of the density curve - I do not know its tail and I want to estimate the tail (and the whole density function)


Solution

  • First try with all points

    Second try with first point dropped Here is a better attempt, like before it uses optim to find the best value constrained to a set of values in a box (defined by the lower and upper vectors in the optim call). Notice it scales x and y as part of the optimization in addition to the Weibull distribution shape parameter, so we have 3 parameters to optimize over.

    Unfortunately when using all the points it pretty much always finds something on the edges of the constraining box which indicates to me that maybe Weibull is maybe not a good fit for all of the data. The problem is the two points - they ares just too large. You see the attempted fit to all data in the first plot.

    If I drop those first two points and just fit the rest, we get a much better fit. You see this in the second plot. I think this is a good fit, it is in any case a local minimum in the interior of the constraining box.

    library(optimx)
    sample <- c(60953,7787,3056,2359,1759,1819,1189,1077,1080,985,622,648,518,
                611,1037,727,489,432,371,1125,69,595,624)
    t.sample <- 0:22
    
    s.fit <- sample[3:23]
    t.fit <- t.sample[3:23]
    
    wx <- function(param) { 
      res <- param[2]*dweibull(t.fit*param[3],shape=param[1])
      return(res)
    } 
    minwx <- function(param){
      v <- s.fit-wx(param)
      sqrt(sum(v*v))
    }
    
    p0 <- c(1,200,1/20)
    paramopt <- optim(p0,minwx,gr=NULL,lower=c(0.1,100,0.01),upper=c(1.1,5000,1))
    
    popt <- paramopt$par
    popt
    rms <- paramopt$value
    tit <- sprintf("Weibull - Shape:%.3f xscale:%.1f  yscale:%.5f rms:%.1f",popt[1],popt[2],popt[3],rms)
    
    plot(t.sample[2:23], sample[2:23], type = "p",col="darkred")
    lines(t.fit, wx(popt),col="blue")
    title(main=tit)