Search code examples
pythonlangchainllamaollama

Langchain, Ollama, and Llama 3 prompt and response


Currently, I am getting back multiple responses, or the model doesn't know when to end a response, and it seems to repeat the system prompt in the response(?). I simply want to get a single response back. My setup is very simple, so I imagine I am missing implementation details, but what can I do to only return the single response?

from langchain_community.llms import Ollama

llm = Ollama(model="llama3")

def get_model_response(user_prompt, system_prompt):
    prompt = f"""
        <|begin_of_text|>
        <|start_header_id|>system<|end_header_id|>
        { system_prompt }
        <|eot_id|>
        <|start_header_id|>user<|end_header_id|>
        { user_prompt }
        <|eot_id|>
        <|start_header_id|>assistant<|end_header_id|>
        """
    response = llm.invoke(prompt)
    return response

Solution

  • Using a PromptTemplate from Langchain, and setting a stop token for the model, I was able to get a single correct response.

    from langchain_community.llms import Ollama
    from langchain import PromptTemplate # Added
    
    llm = Ollama(model="llama3", stop=["<|eot_id|>"]) # Added stop token
    
    def get_model_response(user_prompt, system_prompt):
        # NOTE: No f string and no whitespace in curly braces
        template = """
            <|begin_of_text|>
            <|start_header_id|>system<|end_header_id|>
            {system_prompt}
            <|eot_id|>
            <|start_header_id|>user<|end_header_id|>
            {user_prompt}
            <|eot_id|>
            <|start_header_id|>assistant<|end_header_id|>
            """
    
        # Added prompt template
        prompt = PromptTemplate(
            input_variables=["system_prompt", "user_prompt"],
            template=template
        )
        
        # Modified invoking the model
        response = llm(prompt.format(system_prompt=system_prompt, user_prompt=user_prompt))
        
        return response