Currently, I am getting back multiple responses, or the model doesn't know when to end a response, and it seems to repeat the system prompt in the response(?). I simply want to get a single response back. My setup is very simple, so I imagine I am missing implementation details, but what can I do to only return the single response?
from langchain_community.llms import Ollama
llm = Ollama(model="llama3")
def get_model_response(user_prompt, system_prompt):
prompt = f"""
<|begin_of_text|>
<|start_header_id|>system<|end_header_id|>
{ system_prompt }
<|eot_id|>
<|start_header_id|>user<|end_header_id|>
{ user_prompt }
<|eot_id|>
<|start_header_id|>assistant<|end_header_id|>
"""
response = llm.invoke(prompt)
return response
Using a PromptTemplate
from Langchain, and setting a stop token for the model, I was able to get a single correct response.
from langchain_community.llms import Ollama
from langchain import PromptTemplate # Added
llm = Ollama(model="llama3", stop=["<|eot_id|>"]) # Added stop token
def get_model_response(user_prompt, system_prompt):
# NOTE: No f string and no whitespace in curly braces
template = """
<|begin_of_text|>
<|start_header_id|>system<|end_header_id|>
{system_prompt}
<|eot_id|>
<|start_header_id|>user<|end_header_id|>
{user_prompt}
<|eot_id|>
<|start_header_id|>assistant<|end_header_id|>
"""
# Added prompt template
prompt = PromptTemplate(
input_variables=["system_prompt", "user_prompt"],
template=template
)
# Modified invoking the model
response = llm(prompt.format(system_prompt=system_prompt, user_prompt=user_prompt))
return response