machine-learning julia svm mathematical-optimization

Calculate SVM dual loss effeciently

I am trying to implement a SVM dual optimization problem in Julia. My data set is the 60000x784 MNIST number recognition dataset. I'm using one-vs-rest strategy to combine 10 classifiers, so this is just the component for the underlying binary classifier, where $\pmb y_i \in {-1,1}$

The linear version is very fast:

function dual_loss(α::Vec, X::Mat, y::Vec) where {T<:AbstractFloat, Vec<:AbstractVector{T}, Mat<:AbstractMatrix{T}}
    t = α .* y
    return sum(α) - t' * X * X' * t
end

The kernel version however is extremely slow and I am not patient enough to wait for it to finish:

function dual_loss(α::Vec, X::Mat, y::Vec, kernel::Function=dot) where {T<:AbstractFloat, Vec<:AbstractVector{T}, Mat<:AbstractMatrix{T}}
    t = α .* y
    return sum(α) - sum(@views 0.5 * t[i] * t[j] * kernel(X[i, :], X[j, :]) for i=1:size(X, 1), j=1:size(X, 1))
end

I also tried to use the MLKernel.jl library and it is still very slow. I wonder is this due to the scale of the problem itself (I'm on my Laptop PC and running on CPU), or due to my code being not optimized enough? If so, Is there any way to improve its performance?

Solution

I will give this question a shot, although I am not all that familiar with julia. The SVM is transforming the data you are giving it into a higher dimension so as to find a linear split between the classification of data. That brings about some confusion as to how a linear split can separate 10 different numbers (0-9 is what is contained in the MNIST data set.

In the linear split code, I'm not sure how that is going to separate the data into different groupings, since I don't believe a linear SVM will do much good. That is why it runs very fast, but most likely is very inaccurate (don't know your results, just assuming). One other note is I think this is only transforming it into just a slightly larger dimensionality from what it starts at, which won't help much in the separation.

In the kernel code, this SHOULD take longer than linear, as I understand that it is being transformed into a much higher dimension prior to asserting a final line to separate the data.

I do not think your code can be optimized...kernels, especially on very large data sets, can be very taxing. I have the same problem when running other ML tactics on my computer's CPU; it will max out at 100% and take hours to perform. You can always rent space on some cloud server with higher specs to run your code.

I'd also say look into some other methods of classification, as I don't see how one can give a binary classification to 10 different labels.

Let me know if this helps!