I have been struggling with this problem for a while and although I think I am close I can't seem to get to the answer. Say I have a dataset that I want to simultaneously fit multiple models to, but I want to constrain the models to be the same save for a multiplicative factor, so:
y=(a1 + a2.x1 + a3.x2)f
where f is the multiplicative factor and a1, a2, a3 are the estimated coefficients in the lm-call but are constrained to be the same for each model. I've been trying to find a solution by setting f as a discrete interaction parameter. Below is my attempt at this for some synthetic data:
#generate some data and label it
x=runif(75, 0, 10)
f1=gl(n = 3, k = 25, labels = c(1,2,3))
modmat=model.matrix(~ x * f1, data.frame(f1 = f1, x = x^2)) #<---x^2!
#use it to generate dataset initialized with some random coefficients
coeff=c(1, 3, -2, 1.5, 2, 3)
y=rnorm(n = 75, mean = modmat %*% coeff, sd = 0.5)
dat=data.frame(y = y, f1 = f1, x = x)
plot(x,y)
#regress on it
model=lm(y ~ I(x + x^2) * f1 ) #<---apply interaction to whole model
summary(model)
#generate fits
p1=predict(model,data.frame(f1='1',x))
p2=predict(model,data.frame(f1='2',x))
p3=predict(model,data.frame(f1='3',x))
#plot
par(mfrow=c(1,1))
plot(x,y)
points(x,p1,col='red',pch=19,cex=.5)
points(x,p2,col='blue',pch=19,cex=.5)
points(x,p3,col='green',pch=19,cex=.5)
which gives me:
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -5.52794 1.10183 -5.017 3.91e-06
I(x + x^2) 2.77141 0.01745 158.849 < 2e-16
f12 -2.51479 1.44770 -1.737 0.0868
f13 -1.29706 1.42595 -0.910 0.3662
I(x + x^2):f12 1.74424 0.02856 61.075 < 2e-16
I(x + x^2):f13 2.67646 0.02776 96.398 < 2e-16
If I understand this correctly, and I may not, then this is not what I'm looking for as the interaction terms effectively result in a different model rather than the same model with a factor applied. I feel this is the right approach but I'm not well versed enough in using interaction parameter to get it to work.
Any help would be greatly appreciated!
glenn
I really can see no way to transform this to a simple linear model. The fact that you have two different levels for estimating the parameters makes that tricky. So instead of linear model, you could use nls
and fit a non-linear model.
Here i'll recreate some sample data based on my understanding of the model
set.seed(15)
dat <- data.frame(
x = runif(75, 0, 10),
f = gl(n = 3, k = 25, labels = c(1,2,3))
)
dat <- transform(dat, y = 1 + (-2 * x + 1.5 * x^2) * c(1,1.5,2)[f] + rnorm(75))
plot(y~x, dat, col=dat$f)
So i've assumed that there's just one intercept term (int
). Also note that the first multiplier (for level=="1") should be fixed at 1. If you allowed that to vary, you would have an identify-ability problem with the coefficients of your x
terms. Actually, it doesn't much matter in creating sample data, but it does matter in model fitting.
So here's how I would fit such a model
fit <- nls(y ~ int + (a1 * x + a2 * x^2) * c(1,f2,f3)[f], dat,
start=list(int=1, a1=1, a2=1,f2=1, f3=1))
fit
# Nonlinear regression model
# model: y ~ int + (a1 * x + a2 * x^2) * c(1, f2, f3)[f]
# data: dat
# int a1 a2 f2 f3
# 1.148 -2.037 1.502 1.507 2.004
# residual sum-of-squares: 56.99
# Number of iterations to convergence: 4
# Achieved convergence tolerance: 9.172e-07
Note that it's always a good idea to supply start=
values to nls
even if we just set everything to 1 by default. So we do a pretty decent job of recovering our simulated values. Again, we're fixing f1=1
. Finally we can compare the true values to the predicted values
plot(predict(fit)~y, dat, col=dat$f, xlab="true", ylab="predicted")
abline(0,1, lty=2, col="grey")
so that's a pretty good fit.