Search code examples
juliak-meansspherical-kmeans

Calculating Cosine Similarity in Julia for K-Means


I am making with an implementation of K-means clustering in Julia.

Figure out, and implement a modification of k-means that alternatively measure similarity by the angle between vectors.

So I assumed that one could use Cosine Similarity for this, I have made the code work with regular K-means by calculating th squared Euclidian Distance, by this:

Distances[:,i] = sum((X.-C[[i],:]).^2, dims=2) # Where C is center, Distances are added using the i-th center

I tried to do this by using cosine similarity such as this:

Distances[:, i] = sum(1 .- ((X*C[[i], :]).^2 /(sum(X.^2, dims=2).*(C[[i],:]'*C[[i],:]))))

But this seems to not be working.

Have I misunderstood the question or am I implementing it wrong?


Solution

  • In my Beta Machine Learning Package, module Utils, I implemented the distances as:

    using LinearAlgebra
    """L1 norm distance (aka _Manhattan Distance_)"""
    l1_distance(x,y)     = sum(abs.(x-y))
    """Euclidean (L2) distance"""
    l2_distance(x,y)     = norm(x-y)
    """Squared Euclidean (L2) distance"""
    l2²_distance(x,y)    = norm(x-y)^2
    """Cosine distance"""
    cosine_distance(x,y) = dot(x,y)/(norm(x)*norm(y))
    

    I then use them in the cluster module. Note that you need the standard library package LinearAlgebra.