Search code examples
rlinear-regression

Fitting custom regression equation with data in R


Let say I have below data

  library(zoo)
  Dates = seq(as.Date('2000-01-01'), as.Date('2005-12-31'), by = '6 months')
  Data = rbind(data.frame(time = Dates, y = rnorm(length(Dates), 0, 10), month = as.factor(format(Dates, '%m')), type = 'A', M = log(12+0:11)),
                data.frame(time = Dates, y = rnorm(length(Dates), 0, 10), month = as.factor(format(Dates, '%m')), type = 'B', M = log(3+0:11)),
                data.frame(time = Dates, y = rnorm(length(Dates), 0, 10), month = as.factor(format(Dates, '%m')), type = 'C', M = log(2+0:11)),
                data.frame(time = Dates[3:10], y = rnorm(8, 0, 10), month = as.factor(format(Dates[3:10], '%m')), type = 'D', M = log(10+0:7)))
  XX = zoo(rt(length(Dates), 2, 0), Dates)

And a hypothetical model

y[t, type] = Beta[0] + Beta[1] * xx[t] + Beta[2] * type + Beta[3] * month + Beta[4] * M[t, type] + error

I am trying to use lm() function to estimate the parameters of above model, given the data, but not sure how to fit above equation in lm() function.

Is it possible to use lm() function for above model? What are other alternatives?


Solution

  • This doesn't seem like a particularly unusual model specification. You want:

    y[t, type] = Beta[0] + Beta[1] * xx[t] + Beta[2] * type + 
        Beta[3] * month + Beta[4] * M[t, type] + error
    

    Given the way your data are set up, you can think of this as indexing by i:

    y[t[i], type[i]] = ... Beta[1] * xx[t[i]] + Beta[2] * type[i] + ... +
        Beta[4]* M[t[i], type[i]] ...
    

    Which corresponds to this formula in lm (the 1 stands for the intercept/Beta[0] term, which will be added by default in any case unless you add 0 or -1 to your formula).

    y ~ 1 + xx + type + month + M
    

    The one thing that doesn't match your desired specification is that, because type is a categorical variable (factor) with more than two levels, there won't be a single parameter Beta[2]: instead, R will internally convert type to a series of (n_level-1) dummy variables (search for questions/material about "contrasts" to understand this process) better).