Search code examples
databaseelasticsearchvector

The [cosine] similarity does not support vectors with zero magnitude in elastic search


I was trying to perform data ingestion with my fake dataset to Elastic Search . My fake dataset contains a lot of 0 vectors (0,0, ... , ) and got the error about ingesting zero vectors.

{'errors': True, 'took': 64, 'items': [{'create': {'_index': 'bdp-cus-feature-store-1.1-vector-p1', '_id': '5aeefc2e656b6f77c60af0d90f1280b52e298441d29a80c69321155aa9700438', 'status': 400, 'error': {'type': 'document_parsing_exception', 'reason': '[1:3931] failed to parse: The [cosine] similarity does not support vectors with zero magnitude. Preview of invalid vector: [0.0, 0.0, 0.0, 0.0, 0.0, ...]', 'caused_by': {'type': 'illegal_argument_exception', 'reason': 'The [cosine] similarity does not support vectors with zero magnitude. Preview of invalid vector: [0.0, 0.0, 0.0, 0.0, 0.0, ...]'}}}}

I thought Elastic Search is merely a database and I am confused that what ingesting 0 vectors is not allowed. I tried to look for solution and search online but couldn't find any post about it or how to solve it. Anyone has any idea? Many thanks.


Solution

  • Elasticsearch is not merely a database; it is also a vector database among other capabilities. When you index dense vectors, Elasticsearch maintains a special data structure called the Hierarchical Navigable Small World (HNSW) graph. This structure enables fast approximate k-nearest neighbors (kNN) lookups during searches.

    The arrangement of vectors in this graph is based on the similarity between vectors. By default, Elasticsearch uses cosine similarity for this purpose. As explained in this https://stackoverflow.com/a/26703445/783043, cosine similarity doesn’t make much sense for zero vectors, leading Elasticsearch to raise errors about them.

    The solution to this issue is either to switch to a different similarity measure or to avoid indexing the field altogether if you do not plan to search it.

    Solution 1:

    DELETE test
    
    PUT test
    {
      "mappings": {
        "properties": {
          "vector": {
            "type": "dense_vector",
            "dims": 3,
            "similarity": "l2_norm"
          }
        }
      }
    }
    
    POST test/_bulk?refresh=true
    { "index": { "_id": "1" } }
    { "vector": [1, 5, -20]}
    { "index": { "_id": "2" } }
    { "vector": [0, 0, 0]}
    

    Solution 2:

    DELETE test
    
    PUT test
    {
      "mappings": {
        "properties": {
          "vector": {
            "type": "dense_vector",
            "dims": 3,
            "index": false
          }
        }
      }
    }
    
    POST test/_bulk?refresh=true
    { "index": { "_id": "1" } }
    { "vector": [1, 5, -20]}
    { "index": { "_id": "2" } }
    { "vector": [0, 0, 0]}