I am trying to use the following code to vectorize a sentence:
from tensorflow.keras.layers import TextVectorization
text_vectorization_layer = TextVectorization(max_tokens=10000,
ngrams=5,
standardize='lower_and_strip_punctuation',
output_mode='int',
output_sequence_length = 15
)
text_vectorization_layer(['BlackBerry Limited is a Canadian software'])
However, it complains with the following error:
AttributeError: 'NoneType' object has no attribute 'ndims'
You have to first compute the vocabulary of the TextVectorization
layer using either the adapt
method or by passing a vocabulary array to the vocabulary
argument of the layer. Here is a working example:
import tensorflow as tf
text_vectorization_layer = tf.keras.layers.TextVectorization(max_tokens=10000,
ngrams=5,
standardize='lower_and_strip_punctuation',
output_mode='int',
output_sequence_length = 15
)
text_vectorization_layer.adapt(['BlackBerry Limited is a Canadian software'])
print(text_vectorization_layer(['BlackBerry Limited is a Canadian software']))
tf.Tensor([[18 7 11 21 13 2 17 6 10 20 12 16 5 9 19]], shape=(1, 15), dtype=int64)
The strings are tokenized internally. Also, check the docs.