I am making with an implementation of K-means clustering in Julia.
Figure out, and implement a modification of k-means that alternatively measure similarity by the angle between vectors.
So I assumed that one could use Cosine Similarity for this, I have made the code work with regular K-means by calculating th squared Euclidian Distance, by this:
Distances[:,i] = sum((X.-C[[i],:]).^2, dims=2) # Where C is center, Distances are added using the i-th center
I tried to do this by using cosine similarity such as this:
Distances[:, i] = sum(1 .- ((X*C[[i], :]).^2 /(sum(X.^2, dims=2).*(C[[i],:]'*C[[i],:]))))
But this seems to not be working.
Have I misunderstood the question or am I implementing it wrong?
In my Beta Machine Learning Package, module Utils, I implemented the distances as:
using LinearAlgebra
"""L1 norm distance (aka _Manhattan Distance_)"""
l1_distance(x,y) = sum(abs.(x-y))
"""Euclidean (L2) distance"""
l2_distance(x,y) = norm(x-y)
"""Squared Euclidean (L2) distance"""
l2²_distance(x,y) = norm(x-y)^2
"""Cosine distance"""
cosine_distance(x,y) = dot(x,y)/(norm(x)*norm(y))
I then use them in the cluster module.
Note that you need the standard library package LinearAlgebra
.