python openai-api gpt-3 langchain chatgpt-api

How to get more detailed results sources with Langchain

I am trying to put together a simple "Q&A with sources" using Langchain and a specific URL as the source data. The URL consists of a single page with quite a lot of information on it.

The problem is that RetrievalQAWithSourcesChain is only giving me the entire URL back as the source of the results, which is not very useful in this case.

Is there a way to get more detailed source info? Perhaps the heading of the specific section on the page? A clickable URL to the correct section of the page would be even more helpful!

I am slightly unsure whether the generating of the result source is a function of the language model, URL loader or simply RetrievalQAWithSourcesChain alone.

I have tried using UnstructuredURLLoader and SeleniumURLLoader with the hope that perhaps more detailed reading and input of the data would help - sadly not.

Relevant code excerpt:

llm = ChatOpenAI(temperature=0, model_name='gpt-3.5-turbo')
chain = RetrievalQAWithSourcesChain.from_llm(llm=llm, retriever=VectorStore.as_retriever())

result = chain({"question": question})

print(result['answer'])
print("\n Sources : ",result['sources'] )

Solution

ChatGPT is very flexible, and the more explicit you are better results you can get. This link show the docs for the function you are using. there is a parameter for langchain.prompts.BasePromptTemplate that allows you to give ChatGPT more explicit instructions.

It looks like the base prompt template is this

Use the following knowledge triplets to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.\n\n{context}\n\nQuestion: {question}\nHelpful Answer:

You can add in another sentence giving ChatGPT more clear instructions

Please format the answer with JSON of the form { "answer": "{your_answer}", "relevant_quotes": ["list of quotes"] }. Substitutde your_answer as the answer to the question, but also include relevant quotes from the source material in the list.

You may need to tweak it a little bit to get ChatGPT responding well. Then you should be able to parse it.

ChatGPT has 3 message types in the API

User - a message from an end user to the model
model - a message from the model to the end user
system - a message from the prompt engineer to model to add instructions. Lang chain doesn't use this since it's a one-shot prompt

I strongly recommend these courses on ChatGPT since they are from Andrew Ng and very high quality.