Search code examples
rregressioninteraction

Simultaneously fitting multiple models that differ only in terms of a multiplicative factor to a single dataset, in R


I have been struggling with this problem for a while and although I think I am close I can't seem to get to the answer. Say I have a dataset that I want to simultaneously fit multiple models to, but I want to constrain the models to be the same save for a multiplicative factor, so:

y=(a1 + a2.x1 + a3.x2)f

where f is the multiplicative factor and a1, a2, a3 are the estimated coefficients in the lm-call but are constrained to be the same for each model. I've been trying to find a solution by setting f as a discrete interaction parameter. Below is my attempt at this for some synthetic data:

#generate some data and label it
x=runif(75, 0, 10)
f1=gl(n = 3, k = 25, labels = c(1,2,3))
modmat=model.matrix(~ x * f1, data.frame(f1 = f1, x = x^2)) #<---x^2!

#use it to generate dataset initialized with some random coefficients
coeff=c(1, 3, -2, 1.5, 2, 3) 
y=rnorm(n = 75, mean = modmat %*% coeff, sd = 0.5)
dat=data.frame(y = y, f1 = f1, x = x)

plot(x,y)

#regress on it
model=lm(y ~ I(x + x^2) * f1 ) #<---apply interaction to whole model

summary(model)

#generate fits
p1=predict(model,data.frame(f1='1',x))
p2=predict(model,data.frame(f1='2',x))
p3=predict(model,data.frame(f1='3',x))

#plot
par(mfrow=c(1,1))
plot(x,y)
points(x,p1,col='red',pch=19,cex=.5)
points(x,p2,col='blue',pch=19,cex=.5)
points(x,p3,col='green',pch=19,cex=.5)

which gives me:

Coefficients:
               Estimate Std. Error t value Pr(>|t|)    
(Intercept)    -5.52794    1.10183  -5.017 3.91e-06 
I(x + x^2)      2.77141    0.01745 158.849  < 2e-16 
f12            -2.51479    1.44770  -1.737   0.0868   
f13            -1.29706    1.42595  -0.910   0.3662    
I(x + x^2):f12  1.74424    0.02856  61.075  < 2e-16 
I(x + x^2):f13  2.67646    0.02776  96.398  < 2e-16 

If I understand this correctly, and I may not, then this is not what I'm looking for as the interaction terms effectively result in a different model rather than the same model with a factor applied. I feel this is the right approach but I'm not well versed enough in using interaction parameter to get it to work.

Any help would be greatly appreciated!

glenn


Solution

  • I really can see no way to transform this to a simple linear model. The fact that you have two different levels for estimating the parameters makes that tricky. So instead of linear model, you could use nls and fit a non-linear model.

    Here i'll recreate some sample data based on my understanding of the model

    set.seed(15)
    dat <- data.frame(
       x = runif(75, 0, 10),
       f = gl(n = 3, k = 25, labels = c(1,2,3))
    )
    
    dat <- transform(dat, y = 1 + (-2 * x + 1.5 * x^2) * c(1,1.5,2)[f] + rnorm(75))
    
    plot(y~x, dat, col=dat$f)
    

    enter image description here

    So i've assumed that there's just one intercept term (int). Also note that the first multiplier (for level=="1") should be fixed at 1. If you allowed that to vary, you would have an identify-ability problem with the coefficients of your x terms. Actually, it doesn't much matter in creating sample data, but it does matter in model fitting.

    So here's how I would fit such a model

    fit <- nls(y ~ int + (a1 * x + a2 * x^2) * c(1,f2,f3)[f], dat,
        start=list(int=1, a1=1, a2=1,f2=1, f3=1))
    fit
    
    # Nonlinear regression model
    #   model: y ~ int + (a1 * x + a2 * x^2) * c(1, f2, f3)[f]
    #    data: dat
    #    int     a1     a2     f2     f3 
    #  1.148 -2.037  1.502  1.507  2.004 
    #  residual sum-of-squares: 56.99
    
    # Number of iterations to convergence: 4 
    # Achieved convergence tolerance: 9.172e-07
    

    Note that it's always a good idea to supply start= values to nls even if we just set everything to 1 by default. So we do a pretty decent job of recovering our simulated values. Again, we're fixing f1=1. Finally we can compare the true values to the predicted values

    plot(predict(fit)~y, dat, col=dat$f, xlab="true", ylab="predicted")
    abline(0,1, lty=2, col="grey")
    

    so that's a pretty good fit.

    enter image description here