Search code examples

Error when using nls for positive coefficient constraint

I'm trying to run a regression with a constraint to set all coefficients greater than zero. To do this, I am utilizing the nls function. However, I am having an error:

"Error in nls(formula = y ~ . - 1, data = X, start = low, lower = low, : parameters without starting value in 'data': ."

I believe everything is correct here, I tried to set a lower and upper bound on all variables, so I am not sure what is wrong.

Attempt 1:

X <- data.frame(
    x1 = seq(10),
    x2 = seq(10),
    x3 = seq(10),
    x4 = seq(10),
    x5 = seq(10),
    y = seq(10)

low <- dplyr::select(X, -y) %>% names %>% lapply( function(e) 0)
up <-  dplyr::select(X, -y) %>% names %>% lapply( function(e) Inf)
names(low) <- dplyr::select(X, -y) %>% names -> names(up)

fit1 <- nls(formula = y ~ . -1 , data = X,
    start = low,
    lower = low,
    upper = up,
    algorithm = "port"

Attempt 2:
Here I try to set the formula manually but then I get a new error:
"Error in qr(.swts * gr) : dims [product 5] do not match the length of object [10]"

X <- data.frame(
    x1 = seq(10),
    x2 = seq(10),
    x3 = seq(10),
    x4 = seq(10),
    x5 = seq(10),
    y = seq(10)

n <- X %>% dplyr::select( -y ) %>% names %>% paste0( collapse = " + " )
f <- "y ~ %s -1" %>% sprintf( n ) %>% as.formula

low <- dplyr::select(X, -y) %>% names %>% lapply( function(e) 0)
up <-  dplyr::select(X, -y) %>% names %>% lapply( function(e) Inf)
names(low) <- dplyr::select(X, -y) %>% names -> names(up)

fit1 <- nls(formula = f , data = X,
    start = low,
    lower = low,
    upper = up,
    algorithm = "port"

How can I fix this? Thanks!


  • 1) There are several problems here:

    • nls does not use the same formula notation as lm. Have fixed below.
    • the example does not have identifiable parameters, i.e. they are not unique so the calculation will fail. Below we change the example.
    • although 0 starting values seem to work here in general numeric optimization with constraints tends to work better if the starting values are in the interior of the feasible region.

    Using the above we have

    X <- data.frame(
        x1 = rnorm(10),
        x2 = rnorm(10),
        x3 = rnorm(10),
        x4 = rnorm(10),
        x5 = rnorm(10),
        y = rnorm(10)
    fo <- y ~ b1 * x1 + b2 * x2 + b3 * x3 + b4 * x4 + b5 * x5
    st <- c(b1 = 1, b2 = 1, b3 = 1, b4 = 1, b5 = 1)
    nls(fo, X, start = st, lower = numeric(5), algorithm = "port")


    Nonlinear regression model
      model: y ~ b1 * x1 + b2 * x2 + b3 * x3 + b4 * x4 + b5 * x5
       data: X
        b1     b2     b3     b4     b5 
    0.0000 0.1222 0.0000 0.2338 0.1457 
     residual sum-of-squares: 6.477
    Algorithm "port", convergence message: relative convergence (4)

    2) The nnls (non-negative least squares) package can do this directly. We use X defined in (1).

    nnls(as.matrix(X[-6]), X$y)

    giving the following

    Nonnegative least squares model
    x estimates: 0 0.1221646 0 0.2337857 0.1457373 
    residual sum-of-squares: 6.477
    reason terminated: The solution has been computed sucessfully.