Search code examples
pythontensorflowkerasneural-network

What's the output of an Embedding layer in tensorflow and what does GlobalAveragePooling1D do it?


I'm having trouble understanding what a 1D global average pooling does to an embedding layer. I know that embedding layers are like lookup tables. If I have tf.keras.layers.Embedding(vocab_size=30, embedding_dim=7, input_length=10), is the output after feed forwarding a matrix of 10 rows x 7 columns or a 3D tensor of 1 row x 7 columns x 10 length?

If it's 10 rows x 7 columns, does it take the average of each row and output a single vector of shape 10 row x 1 columns?

If it's 1 row x 7 columns x 10 length, does it take the average of each vector and output a single vector also in shape 10 row x 1 columns?


Solution

  • To your first question: What's the output of an Embedding layer in tensorflow?

    The Embedding layer maps each integer value in a sequence that represents a unique word in the vocabulary to a 7-dimensional vector. In the following example, you have two sequences with 10 integer values each. These integer values can range from 0 to 29, where 30 is the size of the vocabulary. Each integer value of each sequence is mapped to a 7-dimensional vector, resulting in the output shape (2, 10, 7), where 2 is the number of samples, 10 is the sequence length and 7 is the dimension of each integer value:

    import tensorflow as tf
    
    samples = 2
    texts = tf.random.uniform((samples, 10), maxval=30, dtype=tf.int32)
    
    embedding_layer = tf.keras.layers.Embedding(30, 7, input_length=10)
    print(embedding_layer(texts))
    
    tf.Tensor(
    [[[ 0.0225671   0.02347589  0.00979777  0.00041901 -0.00628462
        0.02810872 -0.00962182]
      [-0.00848696 -0.04342243 -0.02836052 -0.00517335 -0.0061365
       -0.03012114  0.01677728]
      [ 0.03311044  0.00556745 -0.00702027  0.03381392 -0.04623893
        0.04987461 -0.04816799]
      [-0.03521906  0.0379228   0.03005264 -0.0020758  -0.0384485
        0.04822161 -0.02092661]
      [-0.03521906  0.0379228   0.03005264 -0.0020758  -0.0384485
        0.04822161 -0.02092661]
      [-0.01790254 -0.0175228  -0.01194855 -0.02171307 -0.0059397
        0.02812174  0.01709754]
      [ 0.03117083  0.03501941  0.01058724  0.0452967  -0.03717183
       -0.04691924  0.04459465]
      [-0.0225444   0.01631368 -0.04825303  0.02976335  0.03874404
        0.01886607 -0.04535152]
      [-0.01405543 -0.01035894 -0.01828993  0.01214089 -0.0163126
        0.00249451 -0.03320551]
      [-0.00536104  0.04976835  0.03676006 -0.04985759 -0.04882429
        0.04079831 -0.04694915]]
    
     [[ 0.02474061  0.04651412  0.01263839  0.02834389  0.01770737
        0.027616    0.0391163 ]
      [-0.00848696 -0.04342243 -0.02836052 -0.00517335 -0.0061365
       -0.03012114  0.01677728]
      [-0.02423838  0.00046005  0.01264722 -0.00118362 -0.04956226
       -0.00222496  0.00678415]
      [ 0.02132202  0.02490019  0.015528    0.01769954  0.03830704
       -0.03469712 -0.00817447]
      [-0.03713315 -0.01064591  0.0106518  -0.00899752 -0.04772154
        0.03767705 -0.02580358]
      [ 0.02132202  0.02490019  0.015528    0.01769954  0.03830704
       -0.03469712 -0.00817447]
      [ 0.00416059 -0.03158562  0.00862025 -0.03387908  0.02394537
       -0.00088609  0.01963869]
      [-0.0454465   0.03087567 -0.01201812 -0.02580545  0.02585572
       -0.00974055 -0.02253721]
      [-0.00438716  0.03688161  0.04575384 -0.01561296 -0.0137012
       -0.00927494 -0.02183568]
      [ 0.0225671   0.02347589  0.00979777  0.00041901 -0.00628462
        0.02810872 -0.00962182]]], shape=(2, 10, 7), dtype=float32)
    

    When working with text data, the output of an Embedding layer would be 2 sentences consisting of 10 words each, where each word is mapped to a 7-dimensional vector.

    If you are wondering where these random numbers for each integer in each sequence come from, by default the Embedding layer uses a uniform distribution to generate these values.

    To your second question: What does a 1D global average pooling do to an Embedding layer?

    The layer GlobalAveragePooling1D does nothing more than simply calculate the average over a given dimension in a tensor. The following example calculates the average of the 7 numbers representing a word in each sequence and returns a scalar for each word, resulting in the output shape (2, 10), where 2 is the number of samples (sentences) and 10 represents the average values for of each word. This is equivalent to simply doing tf.reduce_mean(embedding_layer(texts), axis=-1).

    import tensorflow as tf
    
    samples = 2
    texts = tf.random.uniform((samples, 10), maxval=30, dtype=tf.int32)
    
    embedding_layer = tf.keras.layers.Embedding(30, 7, input_length=10)
    average_layer = tf.keras.layers.GlobalAveragePooling1D(data_format = "channels_first")
    print(average_layer(embedding_layer(texts)))