I'm creating a Q&A chatbot and I'm using langchain and qdrant.
I'm trying to configure langchain to be able to use qdrant in a multitenant environment. The doc from qdrant says that the best approach in my case is to use a "Partition by payload" and use a group_id = OneClient inside the payload of each element of a collection, so that then it's possible to filter on that group_id (which in my case will be the client). That's the link to the doc https://qdrant.tech/documentation/tutorials/multiple-partitions/
I'm using langchain and I have added to the documents that I'm saving inside qdrant a "group_id" metadata field.
I'd like to understand how to filter on group_id when I use langchain. This is how I'm using langchain to retrieve the answer to a question:
qdrant = Qdrant(
client=QdrantClient(...),
collection_name="collection1",
embeddings=embeddings
)
prompt = ...
llm = ChatOpenAI(...)
qa_chain = RetrievalQAWithSourcesChain.from_chain_type(
llm=llm,
chain_type="stuff",
return_source_documents=True,
retriever=qdrant.as_retriever(),
chain_type_kwargs = {"prompt": prompt}
)
result = qa_chain({"question": question})
The group_id will represent the client and it is known before the question.
Any help is much appreciated, Thanks.
I have found the answer. Thanks for all the suggestions.
To filter on an attribute "group_id" which is the client_id, I'm adding a metadata group_id = client when I load some data with "VectoreStore.from_documents" and I'm using the "as_retriever" function to pass the search filter and return only the sources with that group_id:
chain = RetrievalQAWithSourcesChain.from_chain_type(
llm=llm,
chain_type=chain_type,
max_tokens_limit=max_tokens_limit,
return_source_documents=True,
retriever=vectorstore.as_retriever(
search_kwargs={'filter': {'group_id': client}}
),
reduce_k_below_max_tokens=False,
chain_type_kwargs = {"prompt": prompt}
)