Search code examples
weaviate

Does weaviate support dot product similarity when using the python sdk


I have saved vectors in Weaviate that I want to query using dot product. I'm using the python sdk and I just don't see anyway of specifying this. Does anyone know if this is possible/not possible?


Solution

  • Hi and thanks for your question.

    The simple answer as of writing this is "not yet, but soon", but I think I need to elaborate a bit to explain more.

    Distance Functions

    Generally, distance functions in Weaviate are entirely pluggable. Anything that can produce a score can be plugged in. For example, see this folder. In fact, you will even see a file named dot_product.go in there. This is because internally for calculating the cosine sim, Weaviate will normalize all vectors on read and then just calculate the dot product.

    APIs

    So, if Weaviate can calculate the dot product why can't you select this option? This is because of a past decision to introduce the certainty field in the API. This field is used to return scores and to limit results by score. The original idea behind the certainty was that we would want a single metric that can produce a number between 0 and 1 to indicate the distance. With cosine sim that's simple, as this is already in the range of -1, 1, so it's very easy to transform it into a certainty. With an unbounded score such as dot product, this isn't so easy.

    Path forward

    Here is a discussion on this topic. Feel free to participate in this discussion. The current favorite option is to deprecate certainty and expose the raw values as either score or distance.

    Any quickfixes?

    We could easily enable new distance scores, such as dot product before the above mentioned API issue is solved. Possibly as an experimental feature using a feature flag. However, you would not be able to see the resulting scores/distances in the APIs.

    Timelines

    I expect the above mentioned issue to be resolved in a couple of weeks as of writing this (April 28, 2022).