Search code examples
google-colaboratoryhuggingface-transformershuggingface-tokenizers

Huggingface AlBert tokenizer NoneType error with Colab


I simply tried the sample code from hugging face website: https://huggingface.co/albert-base-v2

from transformers import AlbertTokenizer, AlbertModel
tokenizer = AlbertTokenizer.from_pretrained('albert-base-v2')
text = "Replace me by any text you'd like."
encoded_input = tokenizer(text, return_tensors='pt')

then I got the following error at the tokenizer step: encoded_input = tokenizer(text, return_tensors='pt')

TypeError: 'NoneType' object is not callable

I tried the same code on my local machine, it worked no problem. The problem seems within Colab. However, I do need help to run this model on colab GPU.

My python version on colab is Python 3.6.9.


Solution

  • I found the answer. After install, import the AlbertTokenizer and Tokenizer=..., I received an error asking me to install SentencePiece package. However, after I install this package and run tokenizer again, I started receiving the error above. So I open a brand new colab session, and install everything including the SentencePiece before creating tokenizer, and this time it worked. The Nonetype error simply means it doesn't know what is albert-base-v2. However if you install the packages in right order colab will recognize better the relationship between AlbertTokenizer and SentencePiece. In short for this to work in colab

    1. Open a new Colab session
    2. Install Transformers and SentencePiece
    3. import AlbertTokenizer
    4. create tokenizer.