Search code examples
tensorflownlp

Tensorflow Keras "TypeError: '>=' not supported between instances of 'int' and 'tuple'"


I am trying to solve an assignment from deeplearning.ai While trying to convert sentences to sequences I get the following error.

TypeError                                 Traceback (most recent call last)
<ipython-input-50-934f9fde7150> in <module>
      1 # Test your function
----> 2 train_pad_trunc_seq = seq_pad_and_trunc(train_sentences, tokenizer, PADDING, TRUNCATING, maxlen=16)
      3 val_pad_trunc_seq = seq_pad_and_trunc(val_sentences, tokenizer, PADDING, TRUNCATING, MAXLEN)
      4 
      5 print(f"Padded and truncated training sequences have shape: {train_pad_trunc_seq.shape}\n")

<ipython-input-47-1ad2379829b0> in seq_pad_and_trunc(sentences, tokenizer, padding, truncating, maxlen)
     16 
     17     # Convert sentences to sequences
---> 18     sequences = tokenizer.texts_to_sequences(sentences)
     19 
     20     # Pad the sequences using the correct padding, truncating and maxlen

/opt/conda/lib/python3.8/site-packages/keras_preprocessing/text.py in texts_to_sequences(self, texts)
    279             A list of sequences.
    280         """
--> 281         return list(self.texts_to_sequences_generator(texts))
    282 
    283     def texts_to_sequences_generator(self, texts):

/opt/conda/lib/python3.8/site-packages/keras_preprocessing/text.py in texts_to_sequences_generator(self, texts)
    315                 i = self.word_index.get(w)
    316                 if i is not None:
--> 317                     if num_words and i >= num_words:
    318                         if oov_token_index is not None:
    319                             vect.append(oov_token_index)

TypeError: '>=' not supported between instances of 'int' and 'tuple'

Here is the link of my github repo with related code.

https://github.com/dkonuk/datascience/blob/main/C3W3_Assignment.ipynb


Solution

  • The tf.keras.preprocessing.text.Tokenizer API does not take train_sentences as an argument. You are passing train_sentences to it hence raising the error.

    Replace the following

    tokenizer = Tokenizer(train_sentences, oov_token=OOV_TOKEN)
    

    with the line below in the fit_tokenizer() menthod.

    tokenizer = Tokenizer(oov_token=OOV_TOKEN,)
    

    For more information on the Tokenizer please refer to this document. Thank you!