Search code examples
scikit-learnjuliapycall

Utilizing Scikit-learn with Python3.11 path in Julia


I'm trying to perform some benchmarking in clustering by various frameworks, But in the case of porting Scikit-learn from python to julia, I can't make it even work. Here is the code:

using PyCall

Train = rand(Float64, 1611, 10)

py"""
def Silhouette_py(Train, k):
    from sklearn.metrics import silhouette_score
    from sklearn.cluster import KMeans
    model = KMeans(n_clusters=k)
    return silhouette_score(Train, model.labels_)
"""

function test(Train, k)
    py"Silhouette_py"(Train, k)
end

The following code leads to an error:

julia> test(Train, 3)
ERROR: PyError ($(Expr(:escape, :(ccall(#= C:\Users\Shayan\.julia\packages\PyCall\ygXW2\src\pyfncall.jl:43 =# @pysym(:PyObject_Call), PyPtr, (PyPtr, PyPtr, PyPtr), o, pyargsptr, kw))))) <class 'AttributeError'>
AttributeError("'KMeans' object has no attribute 'labels_'")
  File "C:\Users\Shayan\.julia\packages\PyCall\ygXW2\src\pyeval.jl", line 5, in Silouhette_py
    const _namespaces = Dict{Module,PyDict{String,PyObject,true}}()
                                       ^^^^^^^^^^^^^

Stacktrace:
  [1] pyerr_check
    @ C:\Users\Shayan\.julia\packages\PyCall\ygXW2\src\exception.jl:62 [inlined]
  [2] pyerr_check
    @ C:\Users\Shayan\.julia\packages\PyCall\ygXW2\src\exception.jl:66 [inlined]
  [3] _handle_error(msg::String)
    @ PyCall C:\Users\Shayan\.julia\packages\PyCall\ygXW2\src\exception.jl:83
  [4] macro expansion
    @ C:\Users\Shayan\.julia\packages\PyCall\ygXW2\src\exception.jl:97 [inlined]
  [5] #107
    @ C:\Users\Shayan\.julia\packages\PyCall\ygXW2\src\pyfncall.jl:43 [inlined]
  [6] disable_sigint
    @ .\c.jl:473 [inlined]
  [7] __pycall!
    @ C:\Users\Shayan\.julia\packages\PyCall\ygXW2\src\pyfncall.jl:42 [inlined]
  [8] _pycall!(ret::PyObject, o::PyObject, args::Tuple{Matrix{Float64}, Int64}, nargs::Int64, kw::Ptr{Nothing})
    @ PyCall C:\Users\Shayan\.julia\packages\PyCall\ygXW2\src\pyfncall.jl:29
  [9] _pycall!(ret::PyObject, o::PyObject, args::Tuple{Matrix{Float64}, Int64}, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ PyCall C:\Users\Shayan\.julia\packages\PyCall\ygXW2\src\pyfncall.jl:11
 [10] (::PyObject)(::Matrix{Float64}, ::Vararg{Any}; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), 
Tuple{}}})
    @ PyCall C:\Users\Shayan\.julia\packages\PyCall\ygXW2\src\pyfncall.jl:86
 [11] (::PyObject)(::Matrix{Float64}, ::Vararg{Any})
    @ PyCall C:\Users\Shayan\.julia\packages\PyCall\ygXW2\src\pyfncall.jl:86
 [12] t(Train::Matrix{Float64}, k::Int64)
    @ Main .\REPL[12]:2
 [13] top-level scope
    @ REPL[20]:1

The libpython and related stuff configuration:

julia> PyCall.libpython
"C:\\Users\\Shayan\\AppData\\Local\\Programs\\Python\\Python311\\python311.dll"

julia> PyCall.pyversion
v"3.11.0"

julia> PyCall.current_python()
"C:\\Users\\Shayan\\AppData\\Local\\Programs\\Python\\Python311\\python.exe"

Further tests

But if I say:

julia> sk = pyimport("sklearn")

julia> model = sk.cluster.KMeans(3)
PyObject KMeans(n_clusters=3)

julia> model.fit(Train)
sys:1: ConvergenceWarning: Number of distinct clusters (1) found smaller than n_clusters (3). Possibly due to duplicate points in X.
PyObject KMeans(n_clusters=3)

julia> model.labels_
1611-element Vector{Int32}:
 0
 0
 0
 0
 0
 0
 ⋮

But I need it to work in a function. As you can see, it doesn't throw AttributeError("'KMeans' object has no attribute 'labels_'") anymore in this case.


Solution

  • It seems this would work:

    KMeans = pyimport("sklearn.cluster").KMeans
    silhouette_score = pyimport("sklearn.metric").silhouette_score
    
    Train = rand(Float64, 1611, 10);
    
    function test(Train, k)
        model = KMeans(k)
        model.fit(Train)
        return silhouette_score(Train, model.labels_)
    end
    
    julia> test(Train, 3)
    0.7885442174636309