Simultaneously fitting multiple models that differ only in terms of a multiplicative factor to a single dataset, in R

I have been struggling with this problem for a while and although I think I am close I can't seem to get to the answer. Say I have a dataset that I want to simultaneously fit multiple models to, but I want to constrain the models to be the same save for a multiplicative factor, so:

y=(a1 + a2.x1 + a3.x2)f

where f is the multiplicative factor and a1, a2, a3 are the estimated coefficients in the lm-call but are constrained to be the same for each model. I've been trying to find a solution by setting f as a discrete interaction parameter. Below is my attempt at this for some synthetic data:

#generate some data and label it
x=runif(75, 0, 10)
f1=gl(n = 3, k = 25, labels = c(1,2,3))
modmat=model.matrix(~ x * f1, data.frame(f1 = f1, x = x^2)) #<---x^2!

#use it to generate dataset initialized with some random coefficients
coeff=c(1, 3, -2, 1.5, 2, 3) 
y=rnorm(n = 75, mean = modmat %*% coeff, sd = 0.5)
dat=data.frame(y = y, f1 = f1, x = x)

plot(x,y)

#regress on it
model=lm(y ~ I(x + x^2) * f1 ) #<---apply interaction to whole model

summary(model)

#generate fits
p1=predict(model,data.frame(f1='1',x))
p2=predict(model,data.frame(f1='2',x))
p3=predict(model,data.frame(f1='3',x))

#plot
par(mfrow=c(1,1))
plot(x,y)
points(x,p1,col='red',pch=19,cex=.5)
points(x,p2,col='blue',pch=19,cex=.5)
points(x,p3,col='green',pch=19,cex=.5)

which gives me:

Coefficients:
               Estimate Std. Error t value Pr(>|t|)    
(Intercept)    -5.52794    1.10183  -5.017 3.91e-06 
I(x + x^2)      2.77141    0.01745 158.849  < 2e-16 
f12            -2.51479    1.44770  -1.737   0.0868   
f13            -1.29706    1.42595  -0.910   0.3662    
I(x + x^2):f12  1.74424    0.02856  61.075  < 2e-16 
I(x + x^2):f13  2.67646    0.02776  96.398  < 2e-16

If I understand this correctly, and I may not, then this is not what I'm looking for as the interaction terms effectively result in a different model rather than the same model with a factor applied. I feel this is the right approach but I'm not well versed enough in using interaction parameter to get it to work.

Any help would be greatly appreciated!

glenn

Solution

I really can see no way to transform this to a simple linear model. The fact that you have two different levels for estimating the parameters makes that tricky. So instead of linear model, you could use nls and fit a non-linear model.

Here i'll recreate some sample data based on my understanding of the model

set.seed(15)
dat <- data.frame(
   x = runif(75, 0, 10),
   f = gl(n = 3, k = 25, labels = c(1,2,3))
)

dat <- transform(dat, y = 1 + (-2 * x + 1.5 * x^2) * c(1,1.5,2)[f] + rnorm(75))

plot(y~x, dat, col=dat$f)

enter image description here

So i've assumed that there's just one intercept term (int). Also note that the first multiplier (for level=="1") should be fixed at 1. If you allowed that to vary, you would have an identify-ability problem with the coefficients of your x terms. Actually, it doesn't much matter in creating sample data, but it does matter in model fitting.

So here's how I would fit such a model

fit <- nls(y ~ int + (a1 * x + a2 * x^2) * c(1,f2,f3)[f], dat,
    start=list(int=1, a1=1, a2=1,f2=1, f3=1))
fit

# Nonlinear regression model
#   model: y ~ int + (a1 * x + a2 * x^2) * c(1, f2, f3)[f]
#    data: dat
#    int     a1     a2     f2     f3 
#  1.148 -2.037  1.502  1.507  2.004 
#  residual sum-of-squares: 56.99

# Number of iterations to convergence: 4 
# Achieved convergence tolerance: 9.172e-07

Note that it's always a good idea to supply start= values to nls even if we just set everything to 1 by default. So we do a pretty decent job of recovering our simulated values. Again, we're fixing f1=1. Finally we can compare the true values to the predicted values

plot(predict(fit)~y, dat, col=dat$f, xlab="true", ylab="predicted")
abline(0,1, lty=2, col="grey")

so that's a pretty good fit.

enter image description here