Search code examples
roptimizationstatistics

Produced invalid scale when simulated Generalized Pareto Distribution to find mse rmse and bias


I am working on Generalized Pareto Distribution in R and I want to calculate the bias, mse and rmse of the disitribtion. However when I'm using the mle function the following error occured.

Error in dgpd(x = samples, scale = scale, shape = shape, log = TRUE) : 
  invalid scale

The code is as following:

library(stats4)
library(evd)
gpd_simulation<-function(N,n,scale,shape){
  conc<-numeric(N)
  for(i in 1:N){
    samples=rgpd(n,scale=scale,shape=shape)
    gpdlik<-function(scale,shape){
      -sum(dgpd(x=samples,scale=scale,shape=shape,log=TRUE))
    }
    result<-mle(minuslogl=gpdlik,start=list(scale=11.63599901,shape=0.1),lower=c(0,0),method="L-BFGS-B")
    conc[i]<-result@coef
  }
  conc
}

summarygpd1<-function(conc){
  bias=mean(conc)-3;bias
  mse=var(conc)+bias^2;mse
  rmse=sqrt(var(conc)+bias^2);rmse
  return(c(bias,mse,rmse))
}

simulate40<-gpd_simulation(1000,40,11.63599901,0.1);simulate40
summarygpd1(simulate40)

However, if i use my own dataset, the following error would not occur, is there something wrong with using rgpd this way?

Just to add, if I increase my sample size, the codes run fine, I increase it to 80, 100 and 120, The code run with no error, but 40 and 60 the error will occur again.


Solution

  • When I run your code, it fails when the mle() function tries a scale that is exactly zero. It never chooses an MLE that's near zero, so it's probably simplest to set the lower limit on the parameter to a small positive number instead of using zero, e.g. setting the lower limit to 0.01 works for me with warnings but no error.

    The warnings come from the line conc[i] <- result@coef. At this point result@coef is length 2, and conc[i] can only hold a single number. If you only want to save the scale, use conc[i] <- result@coef[1].

    This code incorporates both fixes:

    gpd_simulation<-function(N,n,scale,shape){
      conc<-numeric(N)
      for(i in 1:N){
        samples=rgpd(n,scale=scale,shape=shape)
        gpdlik<-function(scale,shape){ 
          -sum(dgpd(x=samples,scale=scale,shape=shape,log=TRUE))
        }
        result<-mle(minuslogl=gpdlik,start=list(scale=11.63599901,shape=0.1),lower=c(0.01,0),method="L-BFGS-B")
        conc[i]<-result@coef[1]
      }
      conc
    }
    

    A more general solution is to figure out what should happen in the limit as scale tends to zero, and have gpdlik return that value when scale is exactly zero.