R optim(){fExtremes} gets 0 hessian matrix

I am using R {fExtremes} to find best parameters of GEV distribution for my data (a vector). but get the following error message

Error in solve.default(fit$hessian) : Lapack routine dgesv: system is exactly singular: U[1,1] = 0

I traced back to fit$hessian, found my hessian matrix is a sigular matrix, all of the elements are 0s. The source code (https://github.com/cran/fExtremes/blob/master/R/GevFit.R) of gevFit() shows fit$hessian is calculated by optim(). The output parameters are exactly the same value as the initial parameters. I am wondering what could be the problems of my data that cause this problem? I copied my code here

> min(sample);
[1] 5.240909

> max(sample)
[1] 175.8677

> length(sample)
[1] 6789

> mean(sample)
[1] 78.04107

>para<-gevFit(sample, type = "mle")
Error in solve.default(fit$hessian) : 
  Lapack routine dgesv: system is exactly singular: U[1,1] = 0

fit = optim(theta, .gumLLH, hessian = TRUE, ..., tmp = data)
> fit

   $par

xi   -0.3129225
mu   72.5542497 
beta  16.4450897 

$value
[1] 1e+06

$counts
function gradient 
       4       NA 

$convergence
[1] 0

$message
NULL



$hessian

     xi  mu beta

xi    0    0     0

mu    0    0      0

beta  0     0      0

I updated my dataset on google docs: https://docs.google.com/spreadsheets/d/1IRRpjmdrrJPhNmfiLism_P0efV_Ot4HlEsa6kwMnljc/edit?usp=sharing

Solution

This is going to be a long story, possibly more suited to https://stats.stackexchange.com/.

====== Part 1 -- The problem ======

This is the sequence generating the error:

library(fExtremes)
samp <- read.csv("optimdata.csv")[ ,2]
## does not converge
para <- gevFit(samp, type = "mle")

We are facing the typical cause of lack-of-convergence when using optim() and friends: inadequate starting values for the optimisation.

To see what goes wrong, let us use the PWM estimator (http://arxiv.org/abs/1310.3222); this consists of an analytical formula, hence it does not incur into convergence problems, since it makes no use of optim():

para <- gevFit(samp, type = "pwm")
fitpwm<- attr(para, "fit")
fitpwm$par.ests

The estimated tail parameter xi is negative, corresponding to a bounded upper tail; in fact the fitted distribution displays even more "upper tail boundedness" than the sample data, as you can see from the "leveling off" of the quantile-quantile graph at the right:

qqgevplot <- function(samp, params){
  probs <- seq(0.1,0.99,by=0.01)
  qqempir <- quantile(samp, probs)
  qqtheor <-  qgev(probs, xi=params["xi"], mu=params["mu"], beta=params["beta"])
  rang <- range(qqempir,qqtheor)
  plot(qqempir, qqtheor, xlim=rang, ylim=rang,
     xlab="empirical", ylab="theoretical",
     main="Quantile-quantile plot")
  abline(a=0,b=1, col=2)
}
qqgevplot(samp, fitpwm$par.ests)

For xi<0.5 the MLE estimator is not regular (http://arxiv.org/abs/1301.5611): the value of -0.46 estimated by PWM for xi is very close to that. Now the PWM estimates are used internally by gevFit() as starting values for optim(): you can see this if you print out the code for the function gevFit():

print(gevFit)
print(.gevFit)
print(.gevmleFit)

The starting value for optim is theta, obtained by PWM. For the specific data at hand, this starting value is not adequate, in that it leads to non-convergence of optim().

====== Part 2 -- solutions? ======

Solution 1 is to use para <- gevFit(samp, type = "pwm") as above. If you'd like to use ML, then you have to specify good starting values for optim(). Unfortunately, the fExtremes package does not make it easy to do so. You can then re-define your own version of .gevmleFit to include those, e.g.

.gevmleFit <- function (data, block = NA, start.param, ...) 
{
  data = as.numeric(data)
  n = length(data)
  if(missing(start.param)){
    theta = .gevpwmFit(data)$par.ests
  }else{
    theta = start.param
  }
  fit = optim(theta, .gevLLH, hessian = TRUE, ..., tmp = data)
  if (fit$convergence) 
    warning("optimization may not have succeeded")
  par.ests = fit$par
  varcov = solve(fit$hessian)
  par.ses = sqrt(diag(varcov))
  ans = list(n = n, data = data, par.ests = par.ests, par.ses = par.ses, 
             varcov = varcov, converged = fit$convergence, nllh.final = fit$value)
  class(ans) = "gev"
  ans
}
## diverges, just as above
.gevmleFit(samp)
## diverges, just as above
startp <- fitpwm$par.ests
.gevmleFit(samp, start.param=startp)
## converges
startp <- structure(c(-0.1, 1, 1), names=names(fitpwm$par.ests))
.gevmleFit(samp, start.param=startp)$par.ests

Now check this out: the beta estimated by PWM is 0.1245; by changing this to a tiny amount, the MLE is made to converge:

startp <- fitpwm$par.ests
startp["beta"]
startp["beta"] <- 0.13
.gevmleFit(samp, start.param=startp)$par.ests

This hopefully clearly illustrates that to blindly optim()ise works until it doesn't and might then turn into a quite delicate endeavour indeed. For this reason, it might be useful to leave this reply here, rather than to migrate to CrossValidated.