How do I decode the output of a pytorch OpenAIGPTModel?

I am trying to decode the outputs of a pytorch OpenAIGPTModel, but I can't see how to go about it, and I can't find any complete examples online.

I've found only this much:

from transformers import OpenAIGPTTokenizer, OpenAIGPTModel
tokenizer = OpenAIGPTTokenizer.from_pretrained('openai-gpt')
model = OpenAIGPTModel.from_pretrained('openai-gpt')

inputs = tokenizer("How does a kite fly?", return_tensors="pt")
outputs = model(**inputs)

outputs has an attribute last_hidden_state which is a torch.FloatTensor of shape (batch_size, sequence_length, hidden_size). I've tried grabbing the first vector of length hidden_size and calling tokenizer.decode(vector.tolist()), but I get:

'<unk><unk><unk><unk>'

I've also tried interpreting my last_hidden_state as a series of probabilities for each token in the lexicon with tokenizer.decode(torch.argmax(last_hidden_states, 2)[0].tolist()), but that also outputs nonsense:

'¨ şore şhave ▪'

Solution

What the model returns in this case isn't actually a representation of tokens.

There are many ways to return results from a large language model: logits predicting particular token IDs, softmaxed percent probabilities of each possible token, a vector representing the embedding of a single token, etc.

Given that you get batch x seq_len x hidden_size it appears that the vector of size "hidden size" is probably a vector representing the embedding of a single token. It's the "lm_head" that's used to convert a token embedding to a token ID.

Checking the doc for that model I see

The bare OpenAI GPT transformer model outputting raw hidden-states without any specific head on top.

Without any head, you're not getting tokens, just embeddings that can be used to predict tokens. You probably want something like OpenAIGPTLMHeadModel that has an LM Head. The given example for that model appears to output logits, so maybe from that you might need to use argmax to select specific tokens. Then once you have token IDs you'd use something like tokenizer.convert_tokens_to_string(tokenizer.convert_ids_to_tokens(tokens))

You may be wondering why a user would want a model that returns logits instead of tokens. Users don't always want to select the "most likely" predicted next token. By returning logits (or a probability distribution) the caller can sample from likely next tokens.