What's the output of an Embedding layer in tensorflow and what does GlobalAveragePooling1D do it?

I'm having trouble understanding what a 1D global average pooling does to an embedding layer. I know that embedding layers are like lookup tables. If I have tf.keras.layers.Embedding(vocab_size=30, embedding_dim=7, input_length=10), is the output after feed forwarding a matrix of 10 rows x 7 columns or a 3D tensor of 1 row x 7 columns x 10 length?

If it's 10 rows x 7 columns, does it take the average of each row and output a single vector of shape 10 row x 1 columns?

If it's 1 row x 7 columns x 10 length, does it take the average of each vector and output a single vector also in shape 10 row x 1 columns?

Solution

To your first question: What's the output of an Embedding layer in tensorflow?

The Embedding layer maps each integer value in a sequence that represents a unique word in the vocabulary to a 7-dimensional vector. In the following example, you have two sequences with 10 integer values each. These integer values can range from 0 to 29, where 30 is the size of the vocabulary. Each integer value of each sequence is mapped to a 7-dimensional vector, resulting in the output shape (2, 10, 7), where 2 is the number of samples, 10 is the sequence length and 7 is the dimension of each integer value:

import tensorflow as tf

samples = 2
texts = tf.random.uniform((samples, 10), maxval=30, dtype=tf.int32)

embedding_layer = tf.keras.layers.Embedding(30, 7, input_length=10)
print(embedding_layer(texts))

tf.Tensor(
[[[ 0.0225671   0.02347589  0.00979777  0.00041901 -0.00628462
    0.02810872 -0.00962182]
  [-0.00848696 -0.04342243 -0.02836052 -0.00517335 -0.0061365
   -0.03012114  0.01677728]
  [ 0.03311044  0.00556745 -0.00702027  0.03381392 -0.04623893
    0.04987461 -0.04816799]
  [-0.03521906  0.0379228   0.03005264 -0.0020758  -0.0384485
    0.04822161 -0.02092661]
  [-0.03521906  0.0379228   0.03005264 -0.0020758  -0.0384485
    0.04822161 -0.02092661]
  [-0.01790254 -0.0175228  -0.01194855 -0.02171307 -0.0059397
    0.02812174  0.01709754]
  [ 0.03117083  0.03501941  0.01058724  0.0452967  -0.03717183
   -0.04691924  0.04459465]
  [-0.0225444   0.01631368 -0.04825303  0.02976335  0.03874404
    0.01886607 -0.04535152]
  [-0.01405543 -0.01035894 -0.01828993  0.01214089 -0.0163126
    0.00249451 -0.03320551]
  [-0.00536104  0.04976835  0.03676006 -0.04985759 -0.04882429
    0.04079831 -0.04694915]]

 [[ 0.02474061  0.04651412  0.01263839  0.02834389  0.01770737
    0.027616    0.0391163 ]
  [-0.00848696 -0.04342243 -0.02836052 -0.00517335 -0.0061365
   -0.03012114  0.01677728]
  [-0.02423838  0.00046005  0.01264722 -0.00118362 -0.04956226
   -0.00222496  0.00678415]
  [ 0.02132202  0.02490019  0.015528    0.01769954  0.03830704
   -0.03469712 -0.00817447]
  [-0.03713315 -0.01064591  0.0106518  -0.00899752 -0.04772154
    0.03767705 -0.02580358]
  [ 0.02132202  0.02490019  0.015528    0.01769954  0.03830704
   -0.03469712 -0.00817447]
  [ 0.00416059 -0.03158562  0.00862025 -0.03387908  0.02394537
   -0.00088609  0.01963869]
  [-0.0454465   0.03087567 -0.01201812 -0.02580545  0.02585572
   -0.00974055 -0.02253721]
  [-0.00438716  0.03688161  0.04575384 -0.01561296 -0.0137012
   -0.00927494 -0.02183568]
  [ 0.0225671   0.02347589  0.00979777  0.00041901 -0.00628462
    0.02810872 -0.00962182]]], shape=(2, 10, 7), dtype=float32)

When working with text data, the output of an Embedding layer would be 2 sentences consisting of 10 words each, where each word is mapped to a 7-dimensional vector.

If you are wondering where these random numbers for each integer in each sequence come from, by default the Embedding layer uses a uniform distribution to generate these values.

To your second question: What does a 1D global average pooling do to an Embedding layer?

The layer GlobalAveragePooling1D does nothing more than simply calculate the average over a given dimension in a tensor. The following example calculates the average of the 7 numbers representing a word in each sequence and returns a scalar for each word, resulting in the output shape (2, 10), where 2 is the number of samples (sentences) and 10 represents the average values for of each word. This is equivalent to simply doing tf.reduce_mean(embedding_layer(texts), axis=-1).

import tensorflow as tf

samples = 2
texts = tf.random.uniform((samples, 10), maxval=30, dtype=tf.int32)

embedding_layer = tf.keras.layers.Embedding(30, 7, input_length=10)
average_layer = tf.keras.layers.GlobalAveragePooling1D(data_format = "channels_first")
print(average_layer(embedding_layer(texts)))