Search code examples
langchainvector-databasefaissretrieval-augmented-generationsimilarity-search

is there a way to filter and exclude documents when doing similarity search in a vector db using langchain?


So far my research only shows me how to filter to a specific a specific document or page but it doesn't show how to exclude some documents from the search.

results_with_scores = db.similarity_search_with_score("foo", filter=dict(page=1))

Solution

  • This depends on the underlying vector database being used. The arguments to filter will typically be passed to the vector database, and behavior will be implementation specific.

    A common choice of vector database is ChromaDB, the filter arguments are passed to where, you can consult the filter documentation in the ChromaDB guide. See also inequality in the unofficial ChromaDB cookbook. You will want to use a $ne expression.

    If you are using a different vector database you may need to consult to documentation for that database and even check the LangChain code to see how things get passed through.