Search code examples
tensorflowtensorflow-hub

How to decode token ids into words?


The text preprocessing models on the hub describe how to convert an input sentence, like "I am a boy", into token ids. But it does not show me how to convert those token ids back into words. I also checked the transformer-encoders document, but I still cannot find any clue.

I did find a detokenize example, but I could not figure out if the token ids used in tf-text are the same as the ids used in the bert_en_uncased_preprocess models.


Solution

  • One option is to use assets/vocab.txt file in the model directory. The line numbers in the file should correspond to token ids.