Search code examples
julialinear-regressionglm

how to add an intercept for linear regression when using matrix as an input in GLM Julia


I am trying to use linear regression in GLM from Julia, with a matrix as inputs rather than a DataFrame.

The inputs are:

julia> x
4×2 Matrix{Int64}:
 1  1
 2  2
 3  3
 4  4

julia> y
4-element Vector{Int64}:
 0
 2
 4
 6

But when I tried to fit it using lm function, I found that the intercept is not default:

julia> lr = lm(x, y)
LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}:

Coefficients:
───────────────────────────────────────────────────────────────
       Coef.  Std. Error     t  Pr(>|t|)   Lower 95%  Upper 95%
───────────────────────────────────────────────────────────────
x1  0.666667   1.11848e7  0.00    1.0000  -4.81244e7  4.81244e7
x2  0.666667   1.11848e7  0.00    1.0000  -4.81244e7  4.81244e7
───────────────────────────────────────────────────────────────

I check the official docs of GLM but they only explain the usage of DataFrames as input. Is there a way of adding intercepts to the model when using matrice as inputs without altering the input (such as adding a column of 1s in x)?


Solution

  • If you are using the X, y method, you are responsible for constructing the design matrix yourself. If you do not want to do that, use the formula method. This requires a bit of intermediate setup with your example, as the data needs to be in tabular form, but you can just create a named tuple:

    data = @views (;y, x1 = x[:, 1], x2 = x[:, 2])
    lm(@formula(y ~ 1 + x1 + x2), data)
    

    If you have a dataframe or similar at hand, you can (probably) directly use it.

    (IIRC, you could also just write @formula(y ~ x1 + x2), and it will add the intercept automatically, as in R. But I prefer the explicit specification.)