I get this error when I want to use the k-Nearest Neighbor algorithm for imputing missing values using Impute.jl
:
using Impute, DataFrames
df = DataFrame(
a=[1,2,3,4,missing],
b=[1, missing, 3, 4, missing],
c=[1, 2, missing, 5, 8],
)
# 5×3 DataFrame
# Row │ a b c
# │ Int64? Int64? Int64?
# ─────┼───────────────────────────
# 1 │ 1 1 1
# 2 │ 2 missing 2
# 3 │ 3 3 missing
# 4 │ 4 4 5
# 5 │ missing missing 8
julia> Impute.knn(Matrix(df), dims=:cols)
ERROR: MethodError: no method matching NearestNeighbors.KDTree(::Matrix{Int64}, ::Distances.Euclidean)
Closest candidates are:
NearestNeighbors.KDTree(::AbstractVecOrMat{T}, ::M; leafsize, storedata, reorder, reorderbuffer) where {T<:AbstractFloat, M<:Union{Distances.Chebyshev, Distances.Cityblock, Distances.Euclidean, Distances.Minkowski, Distances.WeightedCityblock, Distances.WeightedEuclidean, Distances.WeightedMinkowski}} at C:\Users\Shayan\.julia\packages\NearestNeighbors\huCPc\src\kd_tree.jl:85
NearestNeighbors.KDTree(::AbstractVector{V}, ::M; leafsize, storedata, reorder, reorderbuffer) where {V<:AbstractArray, M<:Union{Distances.Chebyshev, Distances.Cityblock, Distances.Euclidean, Distances.Minkowski, Distances.WeightedCityblock, Distances.WeightedEuclidean, Distances.WeightedMinkowski}} at C:\Users\Shayan\.julia\packages\NearestNeighbors\huCPc\src\kd_tree.jl:27
How should I fix this?
The problem is where I'm passing a Matrix of type Union{Missing, Int64}
rather than Union{Missing, Float64}
. Based on the error, NearestNeighbors.KDTree
gets AbstractVecOrMat{T}where {T<:AbstractFloat}
. So first, I should perform a conversion and then pass the result to the knn
imputer:
julia> Impute.knn(
convert(Matrix{Union{Missing, Float64}}, Matrix(df)),
dims=:cols
)
5×3 Matrix{Union{Missing, Float64}}:
1.0 1.0 1.0
2.0 3.0 2.0
3.0 3.0 2.0
4.0 4.0 5.0
4.0 4.0 8.0
After this, I can narrow the eltype
of the result using identity.(result)
with this assumption that I binned the result of imputation to result
:
julia> identity.(result)
5×3 Matrix{Float64}:
1.0 1.0 1.0
2.0 3.0 2.0
3.0 3.0 2.0
4.0 4.0 5.0
4.0 4.0 8.0
The reason behind the latter step is that most functions don't get an object of subtype AbstractMatrix
with element type of Union{Missing, T}
. So narrowing the element type is often unavoidable in such situations.