python nlp huggingface-transformers large-language-model

How can I run some inference on the MPT-7B language model?

I wonder how I can run some inference on the MPT-7B language model. The documentation page on MPT-7B language model on huggingface doesn't mention how to run the inference (i.e., given a few words, predict the next few words).

Solution

https://huggingface.co/mosaicml/mpt-30b gives an example code for inference:

import transformers
model = transformers.AutoModelForCausalLM.from_pretrained(
  'mosaicml/mpt-30b',
  trust_remote_code=True
)

from transformers import pipeline

with torch.autocast('cuda', dtype=torch.bfloat16):
    inputs = tokenizer('Here is a recipe for vegan banana bread:\n', return_tensors="pt").to('cuda')
    outputs = model.generate(**inputs, max_new_tokens=100)
    print(tokenizer.batch_decode(outputs, skip_special_tokens=True))

# or using the HF pipeline
pipe = pipeline('text-generation', model=model, tokenizer=tokenizer, device='cuda:0')
with torch.autocast('cuda', dtype=torch.bfloat16):
    print(
        pipe('Here is a recipe for vegan banana bread:\n',
            max_new_tokens=100,
            do_sample=True,
            use_cache=True))

Just replace mpt-30b with mpt-7b if you wish to use MPT-7B.