Is there any way to get the function below to return a consistent type? I'm doing some work with Julia GLM (love it). I wrote a function that creates all of the possible regression combinations for a dataset. However, my current method of creating a @formula returns a different type for every different length of rhs
.
using GLM
function compose(lhs::Symbol, rhs::AbstractVector{Symbol})
ts = term.((1, rhs...))
term(lhs) ~ sum(ts)
end
Using @code_warntype
for a simple example returns the following
julia> @code_warntype compose(:y, [:x])
Variables
#self#::Core.Compiler.Const(compose, false)
lhs::Symbol
rhs::Array{Symbol,1}
ts::Any
Body::FormulaTerm{Term,_A} where _A
1 ─ %1 = Core.tuple(1)::Core.Compiler.Const((1,), false)
│ %2 = Core._apply(Core.tuple, %1, rhs)::Core.Compiler.PartialStruct(Tuple{Int64,Vararg{Symbol,N} where N}, Any[Core.Compiler.Const(1, false), Vararg{Symbol,N} where N])
│ %3 = Base.broadcasted(Main.term, %2)::Base.Broadcast.Broadcasted{Base.Broadcast.Style{Tuple},Nothing,typeof(term),_A} where _A<:Tuple
│ (ts = Base.materialize(%3))
│ %5 = Main.term(lhs)::Term
│ %6 = Main.sum(ts)::Any
│ %7 = (%5 ~ %6)::FormulaTerm{Term,_A} where _A
└── return %7
And checking the return type of a few different inputs:
julia> compose(:y, [:x]) |> typeof
FormulaTerm{Term,Tuple{ConstantTerm{Int64},Term}}
julia> compose(:y, [:x1, :x2]) |> typeof
FormulaTerm{Term,Tuple{ConstantTerm{Int64},Term,Term}}
We see that as the length of rhs
changes, the return type changes.
Can I change my compose
function so that it always returns the same type? This isn't really a big issue. Compiling for each new number of regressors only takes ~70ms. This is really more of a "how can I improve my Julia skills?"
I do not think you can avoid type unstability here as ~
expects RHS to be a Term
or a Tuple
of Term
s.
However, the most compilation cost you are paying is in term.((1, rhs...))
as you invoke broadcasting which is expensive to compile. Here is how you can do it in a cheaper way:
function compose(lhs::Symbol, rhs::AbstractVector{Symbol})
term(lhs) ~ ntuple(i -> i <= length(rhs) ? term(rhs[i]) : term(1) , length(rhs)+1)
end
or (this is a bit slower but more like your original code):
function compose(lhs::Symbol, rhs::AbstractVector{Symbol})
term(lhs) ~ map(term, (1, rhs...))
end
Finally - if you are doing such computations maybe you can drop using the formula interface but feed to lm
or glm
directly matrices as RHS in which case it should be able to avoid extra compilation cost, e.g.:
julia> y = rand(10);
julia> x = rand(10, 2);
julia> @time lm(x,y);
0.000048 seconds (18 allocations: 1.688 KiB)
julia> x = rand(10, 3);
julia> @time lm(x,y);
0.000038 seconds (18 allocations: 2.016 KiB)
julia> y = rand(100);
julia> x = rand(100, 50);
julia> @time lm(x,y);
0.000263 seconds (22 allocations: 121.172 KiB)