Search code examples
pythonnlphuggingface-transformerslarge-language-model

How can I run some inference on the MPT-7B language model?


I wonder how I can run some inference on the MPT-7B language model. The documentation page on MPT-7B language model  on huggingface doesn't mention how to run the inference (i.e., given a few words, predict the next few words).


Solution

  • https://huggingface.co/mosaicml/mpt-30b gives an example code for inference:

    import transformers
    model = transformers.AutoModelForCausalLM.from_pretrained(
      'mosaicml/mpt-30b',
      trust_remote_code=True
    )
    
    from transformers import pipeline
    
    with torch.autocast('cuda', dtype=torch.bfloat16):
        inputs = tokenizer('Here is a recipe for vegan banana bread:\n', return_tensors="pt").to('cuda')
        outputs = model.generate(**inputs, max_new_tokens=100)
        print(tokenizer.batch_decode(outputs, skip_special_tokens=True))
    
    # or using the HF pipeline
    pipe = pipeline('text-generation', model=model, tokenizer=tokenizer, device='cuda:0')
    with torch.autocast('cuda', dtype=torch.bfloat16):
        print(
            pipe('Here is a recipe for vegan banana bread:\n',
                max_new_tokens=100,
                do_sample=True,
                use_cache=True))
    

    Just replace mpt-30b with mpt-7b if you wish to use MPT-7B.