Langchain, Ollama, and Llama 3 prompt and response

Currently, I am getting back multiple responses, or the model doesn't know when to end a response, and it seems to repeat the system prompt in the response(?). I simply want to get a single response back. My setup is very simple, so I imagine I am missing implementation details, but what can I do to only return the single response?

from langchain_community.llms import Ollama

llm = Ollama(model="llama3")

def get_model_response(user_prompt, system_prompt):
    prompt = f"""
        <|begin_of_text|>
        <|start_header_id|>system<|end_header_id|>
        { system_prompt }
        <|eot_id|>
        <|start_header_id|>user<|end_header_id|>
        { user_prompt }
        <|eot_id|>
        <|start_header_id|>assistant<|end_header_id|>
        """
    response = llm.invoke(prompt)
    return response

Solution

Using a PromptTemplate from Langchain, and setting a stop token for the model, I was able to get a single correct response.

from langchain_community.llms import Ollama
from langchain import PromptTemplate # Added

llm = Ollama(model="llama3", stop=["<|eot_id|>"]) # Added stop token

def get_model_response(user_prompt, system_prompt):
    # NOTE: No f string and no whitespace in curly braces
    template = """
        <|begin_of_text|>
        <|start_header_id|>system<|end_header_id|>
        {system_prompt}
        <|eot_id|>
        <|start_header_id|>user<|end_header_id|>
        {user_prompt}
        <|eot_id|>
        <|start_header_id|>assistant<|end_header_id|>
        """

    # Added prompt template
    prompt = PromptTemplate(
        input_variables=["system_prompt", "user_prompt"],
        template=template
    )
    
    # Modified invoking the model
    response = llm(prompt.format(system_prompt=system_prompt, user_prompt=user_prompt))
    
    return response