Customize LLM output with Langchain

I am trying to create a AI chatbot backend with langchain and fastAPI. I've managed to generate an output based on a user query. The bot embeds a context hardcoded.

However the response I got includes everything (the query, the full template as well as the actual bot's answer). Is there any way to get the bot's answer as answer ?

Thank you

template = """
You are Bot, a AI bot to help user of my portfolio if they need to. Always be thankfull with the user from showing interest to my portfolio.
Depending on user's question you need to point them to the right section of the portfolio. 
If they want to contact me they can use the contact form or visit one of my social media via the links

My name is John Doe and this is my personal portfolio where I display my interests:
    - chess with my chess.com stats
    - running with my strava stats result
    - some pictures showing my accomplishments and my passion for travels and mountains
    - a contact form to reach me out
    - links to my different social media (facebook, linkedin, github, chess.com and strava)

User query : {question}
Bot's answer :"""

app = FastAPI()

@app.post("/conversation")
async def read_conversation(query:str):

    repo_id = "mistralai/Mistral-7B-Instruct-v0.2"

    llm1 = HuggingFaceHub(
    repo_id=repo_id, 
    model_kwargs={"temperature" : 0.7}
    )

    prompt = PromptTemplate(
        input_variables=["question"], template=template
    )
    chain = LLMChain(llm=llm1, prompt=prompt)
    response = chain.invoke({"question":query})

    return {"response" : response}

Solution

There's no guarantee that the LLM will return the expected response. In your case, this means without repeating the original prompt. However, there are a couple things that you can try that may improve your results.

System Prompt

Move the template instructions to a system prompt. This sets the context for how the LLM should respond. In a chat context, the LLM shouldn't repeat the system prompt instructions. It should just respond in a conversational manner. Example:

from langchain.prompts import (
    ChatPromptTemplate,
    HumanMessagePromptTemplate,
    SystemMessagePromptTemplate,
)
from langchain_community.chat_models.huggingface import ChatHuggingFace


sys_template = """
You are Bot, a AI bot to help user of my portfolio if they need to. Always be thankfull with the user from showing interest to my portfolio.
Depending on user's question you need to point them to the right section of the portfolio. 
If they want to contact me they can use the contact form or visit one of my social media via the links

My name is John Doe and this is my personal portfolio where I display my interests:
    - chess with my chess.com stats
    - running with my strava stats result
    - some pictures showing my accomplishments and my passion for travels and mountains
    - a contact form to reach me out
    - links to my different social media (facebook, linkedin, github, chess.com and strava)"""

app = FastAPI()


@app.post("/conversation")
async def read_conversation(query: str):

    repo_id = "mistralai/Mistral-7B-Instruct-v0.2"

    llm1 = HuggingFaceHub(
        repo_id=repo_id, 
        model_kwargs={"temperature" : 0.7}
    )
    chat_model = ChatHuggingFace(llm=llm1)

    chat_prompt = ChatPromptTemplate.from_messages([
        SystemMessagePromptTemplate.from_template(sys_template),
        HumanMessagePromptTemplate.from_template("{question}"),
    ])

    chain = LLMChain(llm=chat_model, prompt=chat_prompt)
    response = chain.invoke({"question": query})

    return {"response": response}

Temperature

Adjust the temperature. Lowering the temperature to 0 will make the LLM response as deterministic as possible.

References

Hugging Face (LangChain)