Build a chatbot with custom data using Langchain

I am trying to understand GPT/langchain . I want to use my own data only but I am not able to find a basic example.

for example, I envision my chat to be something like this:

USER: show me way to build a tree house

GPT : To build a tree house you need the following materials and tools.....

MY owns data in a file mydata.txt with the following content

To build a tree house you need the following tool hammer , nails and materials wood...
....
.....

Can you please show a simple example of how this can be done ..

Solution

Summary

You need to use the Vector DB Text Generation tool in langchain, this tool will allow you to use your own documents as context for the chatbot to use for its answers.The example i will give below is slightly different from the chain in the documentation but i found it works better, not to mention the documentation talks mostly about getting text from a github repo, which isnt your case i suppose.

code below is written in Python

load the imports, LLM model and the document splitter

 from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.text_splitter import CharacterTextSplitter
from langchain.llms import OpenAI
from langchain.chains import RetrievalQA
from langchain.document_loaders import TextLoader
from langchain.prompts import PromptTemplate
from langchain.chat_models import ChatOpenAI

loader = TextLoader("") #put the path and name of the file here, if its in the same directory of the code file you can just use the target file name
documents = loader.load()
llm = ChatOpenAI(model = "gpt-3.5-turbo", temperature=0) //change the model to the one you want to use, tweak the temperature to see which one gives better answers
text_splitter = CharacterTextSplitter(chunk_size=500, chunk_overlap=0) # you can set the size of each doc chunk from your own doc
texts = text_splitter.split_documents(documents)
embeddings = OpenAIEmbeddings() #this will create the vector embeddings of your text
docsearch = Chroma.from_documents(texts, embeddings)

create the prompt template

 from langchain.chains import LLMChain
    prompt_template = """Use the context below to write a 400 word blog post about the topic below:
        Context: {context}
        Topic: {topic}
        Blog post:"""
    #this is the standard prompt template, you can change and experiment with it
    PROMPT = PromptTemplate(
        template=prompt_template, input_variables=["context", "topic"]
    )
    
    chain = LLMChain(llm=llm, prompt=PROMPT)

create the function to make the post and run it

def generate_blog_post(topic):
    docs = search_index.similarity_search(topic, k=4)
#k is basically how many chunks of context will be given to the LLM for each search, more could give more context, but it could cost more tokens or someties even confuse the model, test it and be aware
    inputs = [{"context": doc.page_content, "topic": topic} for doc in docs]
    print(chain.apply(inputs))
generate_blog_post("your question/subject")