Search code examples
c#.netmachine-learningcosine-similarity

What is the fastest method of efficiently calculating cosine similarity of one vector to many in .NET?


Below is the code I'm using currently. I'm comparing vector consisting of 768 floats against 50k others, and it takes about 800ms. I'm assuming that there's a much faster implementation, either in C# or perhaps some package that I can use that does the calculation natively, but I'm having trouble finding it. Thanks!

// USAGE:
// vectors is IEnumerable<float[768]>
// vector is float[768]

    vectors.DotProductSum(vector) * 100)

public static float DotProductSum(this IEnumerable<float> values, IEnumerable<float> other)
{
    return values.Zip(other, (d1, d2) => d1 * d2).Sum();
}

Solution

  • I found a very fast solution, Faiss, which in my testing was able to query 10s of thousands of 2048-float vectors in <5ms. I'm consuming it from .NET, so used the FaissMask wrapper library. You need a number of native dependencies to do so, which you can get by building the faiss repo. I haven't found a package with the dependencies included. Specifically, I needed:

    libgcc_s_seh-1.dll
    libgfortran-3.dll
    libopenblas.dll
    libquadmath-0.dll
    faiss.dll
    faiss_c.dll
    

    After that, the code is very straightforward:

    using var index = new FaissMask.IndexFlat((int)embeddingSize, MetricType.MetricInnerProduct);
    index.Add(vectors);
    var queryResults = index.Search(queryVector, 10);