I am trying to index word embedding vectors to Elasticsearch V8 ann dense_vector
dot_product
.
I can successfully index vec
to cosine
, so I converted it to unit vector with numpy for dot_product
.
unit_vector = vec / np.linalg.norm(vec)
but I get an 400 error saying like this.
The [dot_product] similarity can only be used with unit-length vectors. Preview of invalid vector: [-0.0038341882, -0.1564709, 0.08771773, -0.14555556, -0.07952896, ...]
Am I missing something?
I was confronted with the exact same problem and I found a solution after much experimentation.
In my case, when indexing lots of embeddings to Elasticsearch (dense_vector with similarity parameter set to dot_product), most of them got indexed properly and a small percentage of them failed with The [dot_product] similarity can only be used with unit-length vectors.
I found after intensive testing that the problem was that the unit vectors I was working with were of numerical types np.float16 and this was causing the error. Working with np.float32
as a numerical type in my workflow for my unit vectors solved the issue.