Search code examples
openai-apilangchain

Getting only limited documents in Langchain CSV loader


I am trying to make some queries to my CSV files using Langchain and OpenAI API. I am able to run this code, but i am not sure why the results are limited to only 4 records out of 500 rows in CSV.

I tried to print after loading from csv_loader, It shows all the records, so i am doing something wrong in embeddings/vectors. Can anyone please suggest what can i try?

    csv_loader = CSVLoader(csv_file_path)
    data = csv_loader.load()


    splitter = CharacterTextSplitter(separator = "\n",
                                chunk_size=500, 
                                chunk_overlap=0,
                                length_function=len)
    documents = splitter.split_documents(data)


    embeddings = OpenAIEmbeddings()
    vectorstore = FAISS.from_documents(documents, embeddings)
    vectorstore.save_local("faiss_index_constitution")
    persisted_vectorstore = FAISS.load_local("faiss_index_constitution", embeddings, allow_dangerous_deserialization=True)
    query = "What's the sum of amount of the transactions since 1 March 2024?"

    retriever = persisted_vectorstore.as_retriever()

    chain = RetrievalQA.from_llm(llm=model, retriever=retriever, verbose=True)


    chain_input = {"query": query, "context": None}
    result = chain(chain_input)

    return result

Solution

  • The default number of documents returned by the retriever is 4 (source code). You can specify how many documents to retrieve by specifying the value for k in search_kwargs.

    retriever = persisted_vectorstore.as_retriever(
        search_kwargs={"k": 50}
    )
    

    References

    1. Specifying top k (LangChain)