Why is gradient of first iteration step singular in nls with biv.norm

I am trying to fit a non-linear regression model where the mean-function is the bivariate normal distribution. The parameter to specify is the correlation rho. The problem: "gradient of first iteration step is singular". Why? I have here a little example with simulated data.

# given values for independent variables
x1 <- c(rep(0.1,5), rep(0.2,5), rep(0.3,5), rep(0.4,5), rep(0.5,5))
x2 <- c(rep(c(0.1,0.2,0.3,0.4,0.5),5))


## 1 generate values for dependent variable (incl. error term)
#  from bivariate normal distribution with assumed correlation rho=0.5

fun  <- function(b) pmnorm(x = c(qnorm(x1[b]), qnorm(x2[b])), 
                           mean = c(0, 0), 
                           varcov = matrix(c(1, 0.5, 0.5, 1), nrow = 2))

set.seed(123)
y <- sapply(1:25,  function(b) fun(b)) + runif(25)/1000  


# put it in data frame
dat <- data.frame(y=y, x1=x1, x2=x2 )




# 2 : calculate non-linear regression from the generated data
# use rho=0.51 as starting value

fun <- function(x1, x2,rho) pmnorm(x = c(qnorm(x1), qnorm(x2)), 
                                       mean = c(0, 0), 
                                       varcov = matrix(c(1, rho, rho, 1), nrow = 2))

nls(formula= y ~ fun(x1, x2, rho), data= dat,  start=list(rho=0.51),  
     lower=0, upper=1, trace=TRUE)

This yields an error message:

Error in nls(formula = y ~ fun(x1, x2, rho), data = dat, start = list(rho = 0.51),  : 
singulärer Gradient
In addition: Warning message:
In nls(formula = y ~ fun(x1, x2, rho), data = dat, start = list(rho = 0.51),  :
Obere oder untere Grenzen ignoriert, wenn nicht algorithm= "port"

What I don't understand is

I have only one variable (rho), so there is only one gradient which must be =0 if the matrix of gradients is supposed to be singular. So why should the gradient be =0?
The start value cannot be the problem as I know the true rho=0.5. So the start value =0.51 should be fine, shouldn't it?
The data cannot be completely linear dependent as I added an error term to y.

I would appreciate help very much. Thanks already.

Solution

Perhaps "optim" does a better job than "nls":

library(mnormt)

# given values for independent variables
x1 <- c(rep(0.1,5), rep(0.2,5), rep(0.3,5), rep(0.4,5), rep(0.5,5))
x2 <- c(rep(c(0.1,0.2,0.3,0.4,0.5),5))


## 1 generate values for dependent variable (incl. error term)
#  from bivariate normal distribution with assumed correlation rho=0.5

fun  <- function(b) pmnorm(x = c(qnorm(x1[b]), qnorm(x2[b])), 
                           mean = c(0, 0), 
                           varcov = matrix(c(1, 0.5, 0.5, 1), nrow = 2))

set.seed(123)
y <- sapply(1:25,  function(b) fun(b)) + runif(25)/1000  


# put it in data frame
dat <- data.frame(y=y, x1=x1, x2=x2 )




# 2 : calculate non-linear regression from the generated data
# use rho=0.51 as starting value

fun <- function(x1, x2,rho) pmnorm(x = c(qnorm(x1), qnorm(x2)), 
                                   mean = c(0, 0), 
                                   varcov = matrix(c(1, rho, rho, 1), nrow = 2))

f <- function(rho) { sum( sapply( 1:nrow(dat),
                                  function(i){
                                    (fun(dat[i,2],dat[i,3],rho) - dat[i,1])^2 
                                  } ) ) } 

optim(0.51, f, method="BFGS")

The result is not that bad:

> optim(0.51, f, method="BFGS")
$par
[1] 0.5043406

$value
[1] 3.479377e-06

$counts
function gradient 
      14        4 

$convergence
[1] 0

$message
NULL

Maybe even a little bit better than 0.5:

> f(0.5043406)
[1] 3.479377e-06
> f(0.5)
[1] 1.103484e-05
>

Let's check another start value:

> optim(0.8, f, method="BFGS")
$par
[1] 0.5043407

$value
[1] 3.479377e-06

$counts
function gradient 
      28        6 

$convergence
[1] 0

$message
NULL