I am trying to make some queries to my CSV files using Langchain and OpenAI API. I am able to run this code, but i am not sure why the results are limited to only 4 records out of 500 rows in CSV.
I tried to print after loading from csv_loader, It shows all the records, so i am doing something wrong in embeddings/vectors. Can anyone please suggest what can i try?
csv_loader = CSVLoader(csv_file_path)
data = csv_loader.load()
splitter = CharacterTextSplitter(separator = "\n",
chunk_size=500,
chunk_overlap=0,
length_function=len)
documents = splitter.split_documents(data)
embeddings = OpenAIEmbeddings()
vectorstore = FAISS.from_documents(documents, embeddings)
vectorstore.save_local("faiss_index_constitution")
persisted_vectorstore = FAISS.load_local("faiss_index_constitution", embeddings, allow_dangerous_deserialization=True)
query = "What's the sum of amount of the transactions since 1 March 2024?"
retriever = persisted_vectorstore.as_retriever()
chain = RetrievalQA.from_llm(llm=model, retriever=retriever, verbose=True)
chain_input = {"query": query, "context": None}
result = chain(chain_input)
return result
The default number of documents returned by the retriever is 4
(source code). You can specify how many documents to retrieve by specifying the value for k
in search_kwargs
.
retriever = persisted_vectorstore.as_retriever(
search_kwargs={"k": 50}
)
References