I'm creating a non-linear regression model to fit the data below
Plot
I keep on getting the error:
singular gradient matrix at initial parameter estimates
Can anyone explain what's wrong with my model? The equation I'm trying to write is y = (p1*logbasep2 (x))/(x^p3)
, where p1
, p2
, and p3
are constants.
p1 <- 1
p2 <- 1
p3 <- 2
model1 = nls(ydata ~ ((p1*log(xdata, p3))/(xdata)^p2), start = list(p1 = p1, p2 = p2, p3 = p3))
Below is the sample output for xdata and ydata
> dput(ydata)
c(2.675967443, 5.262229596, 2.756358345, 2.582628563, 2.578517983,
2.572149035, 2.572149035, 2.419269246, 4.342393324, 4.57849526,
2.414960542, 2.414960542, 2.414960542, 2.414960542, 2.655729394,
5.205391565, 3.137641723, 2.503911184, 2.499359843, 2.499198257,
2.498768034, 2.693594595, 5.231803091, 2.998312831, 2.520387095,
2.518634129, 2.518634129, 2.518634129, 2.711184536, 5.229303652,
3.003137243, 2.551123783, 2.516552812, 2.504450358, 2.484247615,
2.581875759, 5.157438135, 3.310365728, 2.620786825, 2.458420168,
2.436577883, 2.434535502, 2.606225185, 5.265676214, 2.71775484,
2.61596361, 2.598126717, 2.598126717, 2.598126717, 2.803018082,
4.934368949, 3.595430381, 3.031594421, 2.227695807, 2.207278748,
2.200613613, 2.594364366, 5.215228585, 3.07169941, 2.694566482,
2.511361391, 2.456389883, 2.456389883, 2.862120485, 5.202934582,
3.056182323, 2.469690653, 2.469690653, 2.469690653, 2.469690653,
2.437314286, 4.587186915, 4.302037827, 2.711703229, 2.346318322,
2.308501078, 2.306938344, 2.30614524, 4.657971143, 4.158221237,
2.943632973, 2.350070603, 2.296930829, 2.287027975, 2.531924554,
5.071156271, 3.541488012, 2.65287316, 2.420471714, 2.391688,
2.39039829, 2.477102765, 5.030773262, 3.642446383, 2.620965051,
2.424021444, 2.402895805, 2.40179529, 2.584714789, 5.03335416,
3.619673092, 2.583602564, 2.533903128, 2.326437301, 2.318314966,
2.49144927, 4.897950266, 3.585821617, 3.227165648, 2.53767512,
2.221395797, 2.038542282, 2.354867369, 4.95865857, 3.766909175,
2.715186396, 2.382613432, 2.372757351, 2.449007707, 2.20573524,
4.55514547, 3.91611881, 3.606189025, 2.303604277, 2.20810652,
2.205100659, 3.300879888, 5.151795375, 2.75624017, 2.449071065,
2.447337834, 2.447337834, 2.447337834, 2.528936269, 4.955034368,
3.754254308, 2.751181588, 2.399415789, 2.308263059, 2.302914619,
2.350317116, 4.873892721, 3.39391574, 2.606991064, 2.443820718,
2.33106264, 2.33106264, 2.621925026, 5.267786769, 2.622588101,
2.621925026, 2.621925026, 2.621925026, 2.621925026, 2.425160063,
5.022138529, 3.663550495, 2.612718078, 2.425326541, 2.42594623,
2.425160063, 2.625820509, 5.265415337, 2.713068882, 2.638650782,
2.585780084, 2.586285083, 2.584979323, 2.508232606, 4.902729122,
3.746937795, 3.015086226, 2.332707845, 2.267248424, 2.227057981,
2.947719346, 5.098315798, 3.368997979, 2.39886785, 2.402312015,
2.392233622, 2.39155339, 2.548810552, 4.525931048, 4.3760105,
2.589251919, 2.429896804, 2.281201495, 2.248897682)
dput(xdata)
c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L,
2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L,
4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L,
6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L,
1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L,
3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L,
5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L,
7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L,
2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L,
4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L,
6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L,
1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L)
Your problem is actually more mathematical than statistical, as it turns out. The problem is that log(x,b1)
and log(x,b2)
differ only by a constant factor (log(x,b1)
= log(x,b2)*log(b2,b1)
). This means that your p1
and p3
parameters are redundant. So everything works if you leave out p3
(this is just one choice of the many reasonable ways to change the formula ...)
Using nls
with the data
argument is preferred, as it makes it easier to do things like predict new values:
dd <- data.frame(x=xdata,y=ydata)
model1 = nls(y ~ ((p1*log(x))/(x)^p2),
start = list(p1 = 1, p2 = 1),
data=dd)
Now predict:
xvec <- seq(1,7,length=101)
par(las=1,bty="l") ## cosmetic
plot(y~x,data=dd)
lines(xvec,predict(model1,newdata=data.frame(x=xvec)))
Note that there are some issues with this fit (the line systematically misses the data at both the lower and the upper end) ...