Search code examples
juliaimputation

no method matching NearestNeighbors.KDTree(::Matrix{Int64}, ::Distances.Euclidean) in Impute.knn


I get this error when I want to use the k-Nearest Neighbor algorithm for imputing missing values using Impute.jl:

using Impute, DataFrames

df = DataFrame(
  a=[1,2,3,4,missing],
  b=[1, missing, 3, 4, missing],
  c=[1, 2, missing, 5, 8],
)
# 5×3 DataFrame
#  Row │ a        b        c
#      │ Int64?   Int64?   Int64?
# ─────┼───────────────────────────
#    1 │       1        1        1
#    2 │       2  missing        2
#    3 │       3        3  missing
#    4 │       4        4        5
#    5 │ missing  missing        8

julia> Impute.knn(Matrix(df), dims=:cols)
ERROR: MethodError: no method matching NearestNeighbors.KDTree(::Matrix{Int64}, ::Distances.Euclidean)
Closest candidates are:
  NearestNeighbors.KDTree(::AbstractVecOrMat{T}, ::M; leafsize, storedata, reorder, reorderbuffer) where {T<:AbstractFloat, M<:Union{Distances.Chebyshev, Distances.Cityblock, Distances.Euclidean, Distances.Minkowski, Distances.WeightedCityblock, Distances.WeightedEuclidean, Distances.WeightedMinkowski}} at C:\Users\Shayan\.julia\packages\NearestNeighbors\huCPc\src\kd_tree.jl:85
  NearestNeighbors.KDTree(::AbstractVector{V}, ::M; leafsize, storedata, reorder, reorderbuffer) where {V<:AbstractArray, M<:Union{Distances.Chebyshev, Distances.Cityblock, Distances.Euclidean, Distances.Minkowski, Distances.WeightedCityblock, Distances.WeightedEuclidean, Distances.WeightedMinkowski}} at C:\Users\Shayan\.julia\packages\NearestNeighbors\huCPc\src\kd_tree.jl:27

How should I fix this?


Solution

  • The problem is where I'm passing a Matrix of type Union{Missing, Int64} rather than Union{Missing, Float64}. Based on the error, NearestNeighbors.KDTree gets AbstractVecOrMat{T}where {T<:AbstractFloat}. So first, I should perform a conversion and then pass the result to the knn imputer:

    julia> Impute.knn(
             convert(Matrix{Union{Missing, Float64}}, Matrix(df)),
             dims=:cols
           )
    5×3 Matrix{Union{Missing, Float64}}:
     1.0  1.0  1.0
     2.0  3.0  2.0
     3.0  3.0  2.0
     4.0  4.0  5.0
     4.0  4.0  8.0
    

    Additional Point

    After this, I can narrow the eltype of the result using identity.(result) with this assumption that I binned the result of imputation to result:

    julia> identity.(result)
    5×3 Matrix{Float64}:
     1.0  1.0  1.0
     2.0  3.0  2.0
     3.0  3.0  2.0
     4.0  4.0  5.0
     4.0  4.0  8.0
    

    The reason behind the latter step is that most functions don't get an object of subtype AbstractMatrix with element type of Union{Missing, T}. So narrowing the element type is often unavoidable in such situations.