Search code examples
elasticsearchopensearchamazon-opensearch

k-NN multiple field search in OpenSearch


Assume that we have this index in OpenSearch:

 {
    "settings": {
        "index.knn": True,
        "number_of_replicas": 0,
        "number_of_shards": 1,
    },
    "mappings": {
        "properties": {
            "title": {"type": "text"},
            "tag": {"type": "text"},
            "e1": {
                "type": "knn_vector",
                "dimension": 512,
                "method": {
                    "name": "hnsw",
                    "space_type": "cosinesimil",
                    "engine": "nmslib",
                    "parameters": {"ef_construction": 512, "m": 24},
                },
            },
            "e2": {
                "type": "knn_vector",
                "dimension": 512,
                "method": {
                    "name": "hnsw",
                    "space_type": "cosinesimil",
                    "engine": "nmslib",
                    "parameters": {"ef_construction": 512, "m": 24},
                },
            },
            "e3": {
                "type": "knn_vector",
                "dimension": 512,
                "method": {
                    "name": "hnsw",
                    "space_type": "cosinesimil",
                    "engine": "nmslib",
                    "parameters": {"ef_construction": 512, "m": 24},
                },
            },
        }
    },
}

And we want to perform a search over all the fields (approximate knn for the vector fields). What would be the correct way to do this in OpenSearch?

I have this query that works but I'm not sure if it is the correct way of doing this and if it uses approximate knn:

{
    "size": 10,
    "query": {
        "bool": {
            "should": [
                {
                    "function_score": {
                        "query": {
                            "knn": {
                                "e1": {
                                    "vector": [0, 1, 2, 3],
                                    "k": 10,
                                },
                            }
                        },
                        "weight": 1,
                    }
                },
                {
                    "function_score": {
                        "query": {
                            "knn": {
                                "e2": {
                                    "vector": [0, 1, 2, 3],
                                    "k": 10,
                                },
                            }
                        },
                        "weight": 1,
                    }
                },
                {
                    "function_score": {
                        "query": {
                            "knn": {
                                "e3": {
                                    "vector": [0, 1, 2, 3],
                                    "k": 10,
                                },
                            }
                        },
                        "weight": 1,
                    }
                },
                {
                    "function_score": {
                        "query": {
                            "match": {"title": "title"}
                        },
                        "weight": 0.1,
                    }
                },
                {
                    "function_score": {
                        "query": {"match": {"tag": "tag"}},
                        "weight": 0.1,
                    }
                },
            ]
        }
    },
    "_source": False,
}

In other words, I want to know how this which is for ElasticSearch can be done in OpenSearch.

Edit 1: I want to do this Elasticsearch new feature in OpenSearch. The question is how and also what does the query mentioned above does exactly.


Solution

  • Searching multiple kNN fields in Elasticsearch is not yet supported. Here you can find the development, not yet released, related to issue #91187 and PR #92118 that was merged for version 8.7... the current version is 8.6.

    In OpenSearch's documentation regarding k-NN, no reference can currently be found. However, looking at this Github issue, it seems that searching on multiple vector fields is possible in a single search request using either a boolean query or a dis_max query.

    In that comment, the solution proposed is a neural query using the script_score but with the knn query it should also be fine. For example:

    {
        "query": {
            "bool": {
                "should": [{
                        "knn": {
                            "vector_field_1": {
                                "vector": [
                                    -0.009013666,
                                    -0.07266349,
                                    ......,
                                    -0.1163235
                                ],
                                "k": 100
                            }
                        }
                    },
                    {
                        "knn": {
                            "vector_field_2": {
                                "vector": [
                                    -0.003729963,
                                    0.14770366,
                                    ......,
                                    0.032361716
                                ],
                                "k": 100
                            }
                        }
                    },
                    {
                        "match": {
                            "general_text": "apple"
                        }
                    }
                ]
            }
        }
    }
    

    Keep in mind that vector is the query vector (i.e. query text encoded into the corresponding vectors) that must have the same number of dimensions as the vector field you are searching against (512 in your example).

    If your intent is to recalculate the relevance score of documents that are returned using a function that you define, function_score or script_score could be used.

    The use of the function_score seems to be still supported in Opensearch, although the documentation is not so exhaustive. Its use, therefore, depends on what you are looking for but certainly defining a weight of 1 means that you are not affecting the score. You can set the explain tag to true to see the explained output and understand how the score is combined:

    GET /_search?explain=true
    

    Finally, if you are interested in vector search with OpenSearch, we recently wrote a blog post in which we provide a detailed description of the new neural search plugin introduced with version 2.4.0 through an end-to-end testing experience.