Search code examples
langchainlarge-language-modelllamactransformers

Number of tokens exceeded maximum limit


I am using the llama2 quantized model from Huggingface and loading it using ctransformers from langchain. When I run the query, I got the below warning

Number of tokens (512) exceeded maximum context length (512)

Below is my code:

from langchain.llms import CTransformers
llm = CTransformers(model='models_k/llama-2-7b-chat.ggmlv3.q2_K.bin',
                      model_type='llama',
                      config={'max_new_tokens': 512,
                              'temperature': 0.01}
                      )

B_INST, E_INST = "[INST]", "[/INST]"
B_SYS, E_SYS = "<<SYS>>\n", "\n<</SYS>>\n\n"

DEFAULT_SYSTEM_PROMPT="""\
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible. 
Please ensure that your responses are socially unbiased and positive in nature.

If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. 
If you don't know the answer to a question, please don't share false information."""

instruction = db_schema + " Based on the database schema provided to you \n Convert the following text from natural language to sql query: \n\n {text} \n only display the sql query"

SYSTEM_PROMPT = B_SYS + DEFAULT_SYSTEM_PROMPT + E_SYS

template = B_INST + SYSTEM_PROMPT + instruction + E_INST

prompt = PromptTemplate(template=template, input_variables=["text"])
LLM_Chain=LLMChain(prompt=prompt, llm=llm)
print(LLM_Chain.run("List the names and prices of electronic products that cost less than $500."))

Can anyone tell me why am i getting this error? Do I have to change the settings?


Solution

  • You can fix this by the suggestion: context length.

    Code like here:

    llm = CTransformers(model='models_k/llama-2-7b-chat.ggmlv3.q2_K.bin',
                          model_type='llama',
                          config={'max_new_tokens': 600,
                                  'temperature': 0.01,
                                  'context_length': 700}
                          )