Search code examples
rregressionnon-linear-regression

How to solve singular gradient matrix error?


I'm creating a non-linear regression model to fit the data below

Plot

enter image description here

I keep on getting the error:

singular gradient matrix at initial parameter estimates

Can anyone explain what's wrong with my model? The equation I'm trying to write is y = (p1*logbasep2 (x))/(x^p3), where p1, p2, and p3 are constants.

p1 <- 1
p2 <- 1 
p3 <- 2


model1 = nls(ydata ~ ((p1*log(xdata, p3))/(xdata)^p2), start = list(p1 = p1, p2 = p2, p3 = p3))

Below is the sample output for xdata and ydata

  > dput(ydata)
c(2.675967443, 5.262229596, 2.756358345, 2.582628563, 2.578517983, 
2.572149035, 2.572149035, 2.419269246, 4.342393324, 4.57849526, 
2.414960542, 2.414960542, 2.414960542, 2.414960542, 2.655729394, 
5.205391565, 3.137641723, 2.503911184, 2.499359843, 2.499198257, 
2.498768034, 2.693594595, 5.231803091, 2.998312831, 2.520387095, 
2.518634129, 2.518634129, 2.518634129, 2.711184536, 5.229303652, 
3.003137243, 2.551123783, 2.516552812, 2.504450358, 2.484247615, 
2.581875759, 5.157438135, 3.310365728, 2.620786825, 2.458420168, 
2.436577883, 2.434535502, 2.606225185, 5.265676214, 2.71775484, 
2.61596361, 2.598126717, 2.598126717, 2.598126717, 2.803018082, 
4.934368949, 3.595430381, 3.031594421, 2.227695807, 2.207278748, 
2.200613613, 2.594364366, 5.215228585, 3.07169941, 2.694566482, 
2.511361391, 2.456389883, 2.456389883, 2.862120485, 5.202934582, 
3.056182323, 2.469690653, 2.469690653, 2.469690653, 2.469690653, 
2.437314286, 4.587186915, 4.302037827, 2.711703229, 2.346318322, 
2.308501078, 2.306938344, 2.30614524, 4.657971143, 4.158221237, 
2.943632973, 2.350070603, 2.296930829, 2.287027975, 2.531924554, 
5.071156271, 3.541488012, 2.65287316, 2.420471714, 2.391688, 
2.39039829, 2.477102765, 5.030773262, 3.642446383, 2.620965051, 
2.424021444, 2.402895805, 2.40179529, 2.584714789, 5.03335416, 
3.619673092, 2.583602564, 2.533903128, 2.326437301, 2.318314966, 
2.49144927, 4.897950266, 3.585821617, 3.227165648, 2.53767512, 
2.221395797, 2.038542282, 2.354867369, 4.95865857, 3.766909175, 
2.715186396, 2.382613432, 2.372757351, 2.449007707, 2.20573524, 
4.55514547, 3.91611881, 3.606189025, 2.303604277, 2.20810652, 
2.205100659, 3.300879888, 5.151795375, 2.75624017, 2.449071065, 
2.447337834, 2.447337834, 2.447337834, 2.528936269, 4.955034368, 
3.754254308, 2.751181588, 2.399415789, 2.308263059, 2.302914619, 
2.350317116, 4.873892721, 3.39391574, 2.606991064, 2.443820718, 
2.33106264, 2.33106264, 2.621925026, 5.267786769, 2.622588101, 
2.621925026, 2.621925026, 2.621925026, 2.621925026, 2.425160063, 
5.022138529, 3.663550495, 2.612718078, 2.425326541, 2.42594623, 
2.425160063, 2.625820509, 5.265415337, 2.713068882, 2.638650782, 
2.585780084, 2.586285083, 2.584979323, 2.508232606, 4.902729122, 
3.746937795, 3.015086226, 2.332707845, 2.267248424, 2.227057981, 
2.947719346, 5.098315798, 3.368997979, 2.39886785, 2.402312015, 
2.392233622, 2.39155339, 2.548810552, 4.525931048, 4.3760105, 
2.589251919, 2.429896804, 2.281201495, 2.248897682)
dput(xdata)
c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 
2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 
4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 
6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 
1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 
3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 
5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 
7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 
2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 
4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 
6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 
1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L)

Solution

  • Your problem is actually more mathematical than statistical, as it turns out. The problem is that log(x,b1) and log(x,b2) differ only by a constant factor (log(x,b1) = log(x,b2)*log(b2,b1)). This means that your p1 and p3 parameters are redundant. So everything works if you leave out p3 (this is just one choice of the many reasonable ways to change the formula ...)

    Using nls with the data argument is preferred, as it makes it easier to do things like predict new values:

    dd <- data.frame(x=xdata,y=ydata)
    model1 = nls(y ~ ((p1*log(x))/(x)^p2),
                 start = list(p1 = 1, p2 = 1),
                 data=dd)
    

    Now predict:

    xvec <- seq(1,7,length=101)
    par(las=1,bty="l")  ## cosmetic
    plot(y~x,data=dd)
    lines(xvec,predict(model1,newdata=data.frame(x=xvec)))
    

    enter image description here

    Note that there are some issues with this fit (the line systematically misses the data at both the lower and the upper end) ...