I'm trying to perform some benchmarking in clustering by various frameworks, But in the case of porting Scikit-learn from python to julia, I can't make it even work. Here is the code:
using PyCall
Train = rand(Float64, 1611, 10)
py"""
def Silhouette_py(Train, k):
from sklearn.metrics import silhouette_score
from sklearn.cluster import KMeans
model = KMeans(n_clusters=k)
return silhouette_score(Train, model.labels_)
"""
function test(Train, k)
py"Silhouette_py"(Train, k)
end
The following code leads to an error:
julia> test(Train, 3)
ERROR: PyError ($(Expr(:escape, :(ccall(#= C:\Users\Shayan\.julia\packages\PyCall\ygXW2\src\pyfncall.jl:43 =# @pysym(:PyObject_Call), PyPtr, (PyPtr, PyPtr, PyPtr), o, pyargsptr, kw))))) <class 'AttributeError'>
AttributeError("'KMeans' object has no attribute 'labels_'")
File "C:\Users\Shayan\.julia\packages\PyCall\ygXW2\src\pyeval.jl", line 5, in Silouhette_py
const _namespaces = Dict{Module,PyDict{String,PyObject,true}}()
^^^^^^^^^^^^^
Stacktrace:
[1] pyerr_check
@ C:\Users\Shayan\.julia\packages\PyCall\ygXW2\src\exception.jl:62 [inlined]
[2] pyerr_check
@ C:\Users\Shayan\.julia\packages\PyCall\ygXW2\src\exception.jl:66 [inlined]
[3] _handle_error(msg::String)
@ PyCall C:\Users\Shayan\.julia\packages\PyCall\ygXW2\src\exception.jl:83
[4] macro expansion
@ C:\Users\Shayan\.julia\packages\PyCall\ygXW2\src\exception.jl:97 [inlined]
[5] #107
@ C:\Users\Shayan\.julia\packages\PyCall\ygXW2\src\pyfncall.jl:43 [inlined]
[6] disable_sigint
@ .\c.jl:473 [inlined]
[7] __pycall!
@ C:\Users\Shayan\.julia\packages\PyCall\ygXW2\src\pyfncall.jl:42 [inlined]
[8] _pycall!(ret::PyObject, o::PyObject, args::Tuple{Matrix{Float64}, Int64}, nargs::Int64, kw::Ptr{Nothing})
@ PyCall C:\Users\Shayan\.julia\packages\PyCall\ygXW2\src\pyfncall.jl:29
[9] _pycall!(ret::PyObject, o::PyObject, args::Tuple{Matrix{Float64}, Int64}, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ PyCall C:\Users\Shayan\.julia\packages\PyCall\ygXW2\src\pyfncall.jl:11
[10] (::PyObject)(::Matrix{Float64}, ::Vararg{Any}; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(),
Tuple{}}})
@ PyCall C:\Users\Shayan\.julia\packages\PyCall\ygXW2\src\pyfncall.jl:86
[11] (::PyObject)(::Matrix{Float64}, ::Vararg{Any})
@ PyCall C:\Users\Shayan\.julia\packages\PyCall\ygXW2\src\pyfncall.jl:86
[12] t(Train::Matrix{Float64}, k::Int64)
@ Main .\REPL[12]:2
[13] top-level scope
@ REPL[20]:1
The libpython
and related stuff configuration:
julia> PyCall.libpython
"C:\\Users\\Shayan\\AppData\\Local\\Programs\\Python\\Python311\\python311.dll"
julia> PyCall.pyversion
v"3.11.0"
julia> PyCall.current_python()
"C:\\Users\\Shayan\\AppData\\Local\\Programs\\Python\\Python311\\python.exe"
But if I say:
julia> sk = pyimport("sklearn")
julia> model = sk.cluster.KMeans(3)
PyObject KMeans(n_clusters=3)
julia> model.fit(Train)
sys:1: ConvergenceWarning: Number of distinct clusters (1) found smaller than n_clusters (3). Possibly due to duplicate points in X.
PyObject KMeans(n_clusters=3)
julia> model.labels_
1611-element Vector{Int32}:
0
0
0
0
0
0
⋮
But I need it to work in a function. As you can see, it doesn't throw AttributeError("'KMeans' object has no attribute 'labels_'")
anymore in this case.
It seems this would work:
KMeans = pyimport("sklearn.cluster").KMeans
silhouette_score = pyimport("sklearn.metric").silhouette_score
Train = rand(Float64, 1611, 10);
function test(Train, k)
model = KMeans(k)
model.fit(Train)
return silhouette_score(Train, model.labels_)
end
julia> test(Train, 3)
0.7885442174636309