Search code examples
amazon-web-servicesvectorizationvector-search

Vector Search in AWS


I need to create a Vector database in AWS. I was using Pinecone in my POC, but for safety reasons the company need something inside AWS. I saw some people recommending to use OpenSearch, but I read in a blog that OpenSearch don't really do Vector Search

Documented in https://www.elastic.co/blog/text-similarity-search-with-vectors-in-elasticsearch the approach to vector search has exactly the same limitation as what we observed with Solr: it will retrieve all documents that match the search criteria (keyword query along with filters on document attributes), and score all of them with the vector similarity of choice (cosine distance, dot-product or L1/L2 norms). That is, vector similarity will not be used during retrieval (first and expensive step): it will instead be used during document scoring (second step). Therefore, since you can’t know in advance, how many documents to fetch to surface most semantically relevant, the mathematical idea of vector search is not really applied.

Does any one know any alternative, or is OpenSearch the best we can do in AWS? I read some people talking about using DynamoDB too, but I didn't fully understand how it work. If any one have any ideas or suggestions I would really appreciate.

Souce: https://towardsdatascience.com/speeding-up-bert-search-in-elasticsearch-750f1f34f455


Solution

  • Amazon OpenSearch has a vector based search plugin called as kNN and has experimental features to allow users to perform semantic search.

    Reference: K-NN
    AWS K-NN
    Semantic Search feature