Search code examples
numpysearchsemanticsknn

Semantic search engine with augmented categories


I'm building semantic search engine by encoding objects in the database (into 512-dim vectors), then encoding the query and finally using k-NN algorithm to find results. The result is good, but ..

I want to try augmenting my objects with additional categories from Wikipedia. So for each object I may get zero or more additional vectors (depending on how many words found in Wikipedia).

My idea is to use numpy.average on all encoded vectors (per object) and then use my regular k-NN search.

Is this an optimal approach? I feel averaging the vectors might not get accurate result.


Solution

  • numpy.average indeed works pretty well for this task. Also I'm satisfied with the approach overall. I hope this info will be handy for someone.