Search code examples
pythontensorflowkerasword-embeddingfunctional-api

How to use Embedding Layer along with textvectorization in functional API


Just starting on tensorflow

Working on imdb dataset. Process: Text encoding using textvectorization layer and passing it to embedded layer:

# Create a custom standardization function to strip HTML break tags '<br />'.
def custom_standardization(input_data):
  lowercase = tf.strings.lower(input_data)
  stripped_html = tf.strings.regex_replace(lowercase, '<br />', ' ')
  return tf.strings.regex_replace(stripped_html,
                              '[%s]' % re.escape(string.punctuation), '')


# Vocabulary size and number of words in a sequence.
vocab_size = 10000
sequence_length = 100

# Use the text vectorization layer to normalize, split, and map strings to
# integers. Note that the layer uses the custom standardization defined above.
# Set maximum_sequence length as all samples are not of the same length.
vectorize_layer = TextVectorization(
standardize=custom_standardization,
max_tokens=vocab_size,
output_mode='int',
output_sequence_length=sequence_length)

# Make a text-only dataset (no labels) and call adapt to build the vocabulary.
text_ds = train_ds.map(lambda x, y: x)
vectorize_layer.adapt(text_ds)

I then try to build a functional API:

embedding_dim=16
text_model_catprocess2 = vectorize_layer
text_model_embedd = tf.keras.layers.Embedding(vocab_size, embedding_dim, name = 'embedding')(text_model_catprocess2)
text_model_embed_proc = tf.keras.layers.Lambda(embedding_mean_standard)(text_model_embedd)
text_model_dense1 = tf.keras.layers.Dense(2, activation = 'relu')(text_model_embed_proc)
text_model_dense2 = tf.keras.layers.Dense(2, activation = 'relu')(text_model_dense1)
text_model_output = tf.keras.layers.Dense(1, activation = 'sigmoid')(text_model_dense2)

However, this is giving the following error:

~\anaconda3\lib\site-packages\keras\backend.py in dtype(x)
1496 
1497   """
-> 1498   return x.dtype.base_dtype.name
1499 
1500 

AttributeError: Exception encountered when calling layer "embedding" (type Embedding).

'str' object has no attribute 'base_dtype'

Call arguments received:
  • inputs=<keras.layers.preprocessing.text_vectorization.TextVectorization object at 0x0000029B483AADC0>

Upon making a sequential API like this, it is working fine:

embedding_dim=16
modelcheck = tf.keras.Sequential([
vectorize_layer,
tf.keras.layers.Embedding(vocab_size, embedding_dim, name="embedding"),
tf.keras.layers.GlobalAveragePooling1D(),
tf.keras.layers.Dense(16, activation='relu'),
tf.keras.layers.Dense(1)
])

I am not sure why this is happening. Is it necessary for the functional API to have an input? Please help!


Solution

  • You have two options. Either you use a Sequential model and it will work as you have confirmed because you do not have to define an Input layer, or you use the functional API where you have to define an Input layer:

    embedding_dim = 16
    text_model_input = tf.keras.layers.Input(dtype=tf.string, shape=(1,))
    text_model_catprocess2 = vectorize_layer(text_model_input)
    text_model_embedd = tf.keras.layers.Embedding(vocab_size, embedding_dim, name = 'embedding')(text_model_catprocess2)
    text_model_embed_proc = tf.keras.layers.Lambda(embedding_mean_standard)(text_model_embedd)
    text_model_dense1 = tf.keras.layers.Dense(2, activation = 'relu')(text_model_embed_proc)
    text_model_dense2 = tf.keras.layers.Dense(2, activation = 'relu')(text_model_dense1)
    text_model_output = tf.keras.layers.Dense(1, activation = 'sigmoid')(text_model_dense2)
    model = tf.keras.Model(text_model_input, text_model_output)