I am trying to use linear regression in GLM from Julia, with a matrix as inputs rather than a DataFrame.
The inputs are:
julia> x
4×2 Matrix{Int64}:
1 1
2 2
3 3
4 4
julia> y
4-element Vector{Int64}:
0
2
4
6
But when I tried to fit it using lm
function, I found that the intercept is not default:
julia> lr = lm(x, y)
LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}:
Coefficients:
───────────────────────────────────────────────────────────────
Coef. Std. Error t Pr(>|t|) Lower 95% Upper 95%
───────────────────────────────────────────────────────────────
x1 0.666667 1.11848e7 0.00 1.0000 -4.81244e7 4.81244e7
x2 0.666667 1.11848e7 0.00 1.0000 -4.81244e7 4.81244e7
───────────────────────────────────────────────────────────────
I check the official docs of GLM but they only explain the usage of DataFrames as input. Is there a way of adding intercepts to the model when using matrice as inputs without altering the input (such as adding a column of 1s in x
)?
If you are using the X, y
method, you are responsible for constructing the design matrix yourself. If you do not want to do that, use the formula method. This requires a bit of intermediate setup with your example, as the data needs to be in tabular form, but you can just create a named tuple:
data = @views (;y, x1 = x[:, 1], x2 = x[:, 2])
lm(@formula(y ~ 1 + x1 + x2), data)
If you have a dataframe or similar at hand, you can (probably) directly use it.
(IIRC, you could also just write @formula(y ~ x1 + x2)
, and it will add the intercept automatically, as in R. But I prefer the explicit specification.)