Search code examples
rstatisticsregressiondata-modelingglm

Error when adjusting a GLM: Error in eval(family$initialize)


I am trying to adjust a generalized linear model defined below:

It must be noted that the response variable Var1, as well as the regressor variable Var2, have zero values, for which a constant has been added to avoid problems when applying the log.

model = glm(Var1+2 ~ log(Var2+2) + offset(log(Var3/Var4)), 
family = gaussian(link = "log"), data = data2)

However, I am facing an error when performing the graph for the diagnostic analysis using the hnp function, which is expressed by:

library(hnp)
hnp(model)
Gaussian model (glm object) 
Error in eval(family$initialize) : 
  cannot find valid starting values: please specify some

In order to get around the situation, I tried to perform the manual implementation to then carry out the construction of the graph, however, the error message is still present.

dfun <- function(obj) resid(obj)

sfun <- function(n, obj) simulate(obj)[[1]]

ffun <- function(resp) glm(resp ~ log(Var2+2) + offset(log(Var3/Var4)), 
family = gaussian(link = "log"), data = data2)

hnp(model, newclass = TRUE, diagfun = dfun, simfun = sfun, fitfun = ffun)

 Error in eval(family$initialize) : 
  cannot find valid starting values: please specify some 

Some guidelines in which I found information to try to solve the problem were used, such as considering initial values to initialize the estimation algorithm both in the linear predictor, as well as for the means, however, these were not enough to solve the problem, see below the computational routine:

fit = lm(Var1+2 ~ log(Var2+2) + offset(log(Var3/Var4)), data=data2)
coefficients(fit)
 (Intercept) log(Var2+2)
    32.961103     -8.283306

model = glm(Var1+2 ~ log(Var2+2) + offset(log(Var3/Var4)), 
family = gaussian(link = "log"), start = c(32.96, -8.28), data = data2)
hnp(model)

Error in eval(family$initialize) : 
  cannot find valid starting values: please specify some 

See that the error persists even when trying to manually implement the half-normal plot.

dfun <- function(obj) resid(obj)

sfun <- function(n, obj) simulate(obj)[[1]]

ffun <- function(resp) glm(resp ~ log(Var2+2) + offset(log(Var3/Var4)), 
family = gaussian(link = "log"), data = data2, start = c(32.96, -8.28))

hnp(model, newclass = TRUE, diagfun = dfun, simfun = sfun, fitfun = ffun)

 Error in eval(family$initialize) : 
  cannot find valid starting values: please specify some 

I also tried to readjust the model by removing the zeros from the database, however, I didn't get any solution to the problem, that is, it still persists.


Solution

  • I suspect what you meant to fit is a log transformed response variable against your predictors. You can more detail about the difference between a log link glm and a log transformed response variable. Essentially when you use a log link, you are assuming the errors are on the exponential scale. I am not so familiar with hnp but my guess it there are problems simulating the response variable.

    If I run your regression like this using the data provided, it looks ok

      data2$Y = with(data2, log( (Var1+2)/Var3/Var4))
    
    model = glm(Y ~ log(Var2+2), data = data2)
    hnp(model)
    

    enter image description here