Truncate output of Hugging Face pipeline for Facebook/Opt LLM to one word

I'm working with Hugging Face's pipeline module to perform text generation with the facebook/opt model. Ideally, I would like the model to only output one word: the expected next word, given the input string.

Currently, my code looks something like this:

from transformers import pipeline
generator = pipeline('text-generation', model="facebook/opt-1.3b")

generate_input = "This is a"
generate_length = (len(generate_input.split()) * 2) + 5
generate_output = generator(generate_input, max_length=generate_length)

Note: I specify max_length, because if I do not, I get a warning when I pass in large input strings.

Obviously, I can remove any excess output after the fact, but my goal would be to provide arguments such that the pipeline model only outputs the single next word. This is because theoretically reducing the amount of output reduces the number of predictions, in turn reducing the computation time. So how can I tell the pipeline to only output the single next word?

Solution

All you need is the max_new_tokens parameter:

from transformers import pipeline
generator = pipeline('text-generation', model="facebook/opt-1.3b")

generate_input = "This is a"
generate_output = generator(generate_input, max_new_tokens=1)
print(generate_output)

Output:

[{'generated_text': 'This is a great'}]

Please note that token!=word. That means depending on your model you might want to use different values and perform a post-processing.