It appears the reference level is selected as the first unique element of the categorical value. However, in my case, the reference is P (and not A).
ols_all= lm( @formula( value ~ Treatment ), s_treat)
gives
value ~ 1 + Treatment
Coefficients:
────────────────────────────────────────────────────────────────────────
Coef. Std. Error t Pr(>|t|) Lower 95% Upper 95%
────────────────────────────────────────────────────────────────────────
(Intercept) 19.0845 0.531803 35.89 <1e-99 18.039 20.13
Treatment: P 5.5775 0.752082 7.42 <1e-12 4.09895 7.05605
What I really want is Treatment: A (P is the placebo or the control group). Granted I could rename the values of the variables. But in SAS and R it is possible to select the reference, hence I am hoping there is a way to do it with Julia GLM as well.
GLM.jl does not take the first unique element of CategoricalVector
, but the first level in this column as a refrence. Therefore if you reorder levels you can change the reference and also the order of appearance of levels in the output. Here is an example:
julia> using CategoricalArrays
julia> using DataFrames
julia> using GLM
julia> y = rand(10)
10-element Vector{Float64}:
0.6680787249599323
0.4405942175942186
0.012595806803754828
0.21742822324104805
0.4588945549282415
0.05463125900208077
0.5889309471773907
0.014531957298235865
0.8444514228200215
0.13148010370633267
julia> x = categorical(rand(["a", "b", "c"], 10))
10-element CategoricalArray{String,1,UInt32}:
"b"
"b"
"a"
"a"
"c"
"a"
"c"
"c"
"a"
"b"
julia> df = DataFrame(x=x, y=y)
10×2 DataFrame
Row │ x y
│ Cat… Float64
─────┼─────────────────
1 │ b 0.668079
2 │ b 0.440594
3 │ a 0.0125958
4 │ a 0.217428
5 │ c 0.458895
6 │ a 0.0546313
7 │ c 0.588931
8 │ c 0.014532
9 │ a 0.844451
10 │ b 0.13148
julia> lm(@formula(y~x), df)
StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}}}}, Matrix{Float64}}
y ~ 1 + x
Coefficients:
────────────────────────────────────────────────────────────────────────
Coef. Std. Error t Pr(>|t|) Lower 95% Upper 95%
────────────────────────────────────────────────────────────────────────
(Intercept) 0.282277 0.165972 1.70 0.1328 -0.110185 0.674739
x: b 0.131108 0.253527 0.52 0.6210 -0.468388 0.730603
x: c 0.0718425 0.253527 0.28 0.7851 -0.527653 0.671338
────────────────────────────────────────────────────────────────────────
julia> levels(df.x)
3-element Vector{String}:
"a"
"b"
"c"
julia> levels!(df.x, ["c", "b", "a"])
10-element CategoricalArray{String,1,UInt32}:
"b"
"b"
"a"
"a"
"c"
"a"
"c"
"c"
"a"
"b"
julia> lm(@formula(y~x), df)
StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}}}}, Matrix{Float64}}
y ~ 1 + x
Coefficients:
───────────────────────────────────────────────────────────────────────────
Coef. Std. Error t Pr(>|t|) Lower 95% Upper 95%
───────────────────────────────────────────────────────────────────────────
(Intercept) 0.354119 0.191648 1.85 0.1071 -0.0990568 0.807295
x: b 0.0592652 0.271031 0.22 0.8331 -0.581622 0.700153
x: a -0.0718425 0.253527 -0.28 0.7851 -0.671338 0.527653
───────────────────────────────────────────────────────────────────────────
More advanced strategies are described here: https://juliastats.org/StatsModels.jl/stable/contrasts/.