What is the perfect way to convert a categorical array to a simple numeric array? For example:
using CategoricalArrays
a = CategoricalArray(["X", "X", "Y", "Z", "Y", "Y", "Z"])
b = recode(a, "X"=>1, "Y"=>2, "Z"=>3)
As a result of the conversion, we still get a categorical array, even if we explicitly specify the type of assigned values:
b = recode(a, "X"=>1::Int64, "Y"=>2::Int64, "Z"=>3::Int64)
It looks like some other approach is needed here, but I can't think of a direction to look in
You have two natural options:
julia> recode(unwrap.(a), "X"=>1, "Y"=>2, "Z"=>3)
7-element Vector{Int64}:
1
1
2
3
2
2
3
or
julia> mapping = Dict("X"=>1, "Y"=>2, "Z"=>3)
Dict{String, Int64} with 3 entries:
"Y" => 2
"Z" => 3
"X" => 1
julia> [mapping[v] for v in a]
7-element Vector{Int64}:
1
1
2
3
2
2
3
the Dict
approach is slower, but it is more flexible in case you would have many levels to map.
The key function here is unwrap
that drops the "categorical" notion of CategoricalValue
(in the Dict
style unwrap
gets called automatically)
Also note that if you just want to get the levelcode
s of the values stored in a CategoricalArray
(something that R does by default) then you can just do:
julia> levelcode.(a)
7-element Vector{Int64}:
1
1
2
3
2
2
3
Also note that with levelcode
missing
is mapped to missing
:
julia> x = CategoricalArray(["Y", "X", missing, "Z"])
4-element CategoricalArray{Union{Missing, String},1,UInt32}:
"Y"
"X"
missing
"Z"
julia> levelcode.(x)
4-element Vector{Union{Missing, Int64}}:
2
1
missing
3