I'm building semantic search engine by encoding objects in the database (into 512-dim vectors), then encoding the query and finally using k-NN algorithm to find results. The result is good, but ..
I want to try augmenting my objects with additional categories from Wikipedia. So for each object I may get zero or more additional vectors (depending on how many words found in Wikipedia).
My idea is to use numpy.average
on all encoded vectors (per object) and then use my regular k-NN search.
Is this an optimal approach? I feel averaging the vectors might not get accurate result.
numpy.average
indeed works pretty well for this task. Also I'm satisfied with the approach overall. I hope this info will be handy for someone.