python langchain large-language-model llama

LangChain + local LLAMA compatible model

I'm trying to setup a local chatbot demo for testing purpose. I wanted to use LangChain as the framework and LLAMA as the model. Tutorials I found all involve some registration, API key, HuggingFace, etc, which seems unnecessary for my purpose.

Is there a way to use a local LLAMA comaptible model file just for testing purpose? And also an example code to use the model with LangChain would be appreciated. Thanks!

UPDATE: I wrote a blog post based on the accepted answer.

Solution

No registration is required to utilize on-prem local models within ecosystems like Hugging Face (HF). Similarly, using Langchain does not involve any registration requirements. Various model formats, such as GGUF and GGML, are employed for storing models for inference and can be found on HF. It is crucial to consider these formats when attempting to load and run a model locally.

For instance, consider TheBloke's Llama-2-7B-Chat-GGUF model, which is a relatively compact 7-billion-parameter model suitable for execution on a modern CPU/GPU. To run the model, we can use Llama.cpp from Langchain:

def llamacpp():
    from langchain_community.llms import LlamaCpp
    from langchain.prompts import PromptTemplate
    from langchain.chains import LLMChain
    
    llm = LlamaCpp(
        model_path="models/Llama-2-7B-Chat-GGUF/llama-2-7b-chat.Q4_0.gguf",
        n_gpu_layers=40,
        n_batch=512,
        verbose=True,
    )
    
    template = """Question: {question}

    Answer: Let's work this out in a step by step way to be sure we have the right answer."""

    prompt = PromptTemplate(template=template, input_variables=["question"])
    
    llm_chain = LLMChain(prompt=prompt, llm=llm)
    question = "Who is Bjarne Stroustrup and how is he related to programming?"
    print(llm_chain.run(question))

And get output from the LLM:

Bjarne Stroustrup is a Danish computer scientist who created C++. - He was born in Aarhus, Denmark on August 5, 1950 and earned his PhD from Cambridge University in 1983. - In 1979 he began developing the programming language C++, which was initially called "C with Classes". - C++ was first released in 1983 and has since become one of the most popular programming languages in use today.

Bjarne Stroustrup is known for his work on the C programming language and its extension to C++. - He wrote The C Programming Language, a book that helped establish C as a widely used language. - He also wrote The Design and Evolution of C++, a detailed explanation of how he created C++ and why he made certain design choices.

...

In this instance, I cloned TheBloke's model repository from HF and positioned it in a directory named models/. The final path for the model became models/Llama-2-7B-Chat-GGUF/llama-2-7b-chat.Q4_0.gguf:

# Make sure you have git-lfs installed (https://git-lfs.com)
git lfs install
git clone https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF

Although the model can be run on a CPU, this was locally run on my Windows PC equipped with an RTX 4070 card with good performance during inference.