Search code examples

AttributeError: 'Tokenizer' object has no attribute 'oov_token' in Keras

I am trying to encode my text using my loaded tokenizer but am getting the following error

AttributeError: 'Tokenizer' object has no attribute 'oov_token'

I included the code below:

from keras.preprocessing.text import Tokenizer
from keras.preprocessing import sequence
from keras.models import Model, Input, Sequential, load_model
import pickle
import h5py

maxlen = 100
tok = open('tokenizer.pickle', 'rb')
tokenizer = pickle.load(tok)
model = load_model('weights.h5')

def predict():
    new_text = sequence.pad_sequences((tokenizer.texts_to_sequences(['heyyyy'])), maxlen=maxlen)
    prediction = model.predict(new_text,batch_size=1,verbose=2)

The problem occurs on the line tokenizer.texts_to_sequences(['heyyyy']) and I'm not sure why. Is the problem with pickle? the tokenizer.texts_to_sequences works with 'hey', 'heyy', and 'heyyy'.

Any guidance is appreciated!


  • This is most probably this issue:

    You can manually set tokenizer.oov_token = None to fix this.

    Pickle is not a reliable way to serialize objects since it assumes that the underlying Python code/modules you're importing have not changed. In general, DO NOT use pickled objects with a different version of the library than what was used at pickling time. That's not a Keras issue, it's a generic Python/Pickle issue. In this case there's a simple fix (set the attribute) but in many cases there will not be.