python machine-learning keras keras-layer embedding

How does keras Embedding layer works if input value greater than input_dim?

How does Embedding layer works if input value greater than input_dim?

Why keras doesn't raise an exception?

from keras.models import Sequential
from keras.layers import Embedding

model = Sequential()
model.add(Embedding(1, 2, trainable=True, mask_zero=False))
input_array = [5]

model.compile("rmsprop", "mse")

output_array = model.predict(input_array)

outpur_array
#array([[[0., 0.]]], dtype=float32)

input value = 5 input_dim = 1

Documentation says that the input value(5) must be less than input_dim(1). In my example it is false, but code still raise no exception

Thank you!

Solution

Embedding layer uses a lookup matrix with shape (input_dim, output_dim). where input dim number embedding vectors to be learned. When I pass the index, layer takes vector by its index from the Embedding matrix.

Thanks for pointing out that I was getting confused with input_length with input_dim.

First of all, there is an error if you use tensorflow.keras.

tensorflow

from tensorflow.keras.models import Model
from tensorflow.keras.layers import Embedding, Input
import numpy as np

ip = Input(shape = (3,))
emb = Embedding(1, 2, trainable=True, mask_zero=True)(ip)

model = Model(ip, emb)
input_array = np.array([[5, 3, 1], [1, 2, 3]])

model.compile("rmsprop", "mse")

output_array = model.predict(input_array)

print(output_array)

print(output_array.shape)

model.summary()

But if I use keras 2.3.1, I don't get any error.

keras 2.3.1

from keras.models import Model
from keras.layers import Embedding, Input
import numpy as np

ip = Input(shape = (3,))
emb = Embedding(1, 2, trainable=True, mask_zero=True)(ip)

model = Model(ip, emb)
input_array = np.array([[5, 3, 1], [1, 2, 3]])

model.compile("rmsprop", "mse")

output_array = model.predict(input_array)

print(output_array)

print(output_array.shape)

model.summary()

So, is keras broken? First thing to notice is keras and tensorflow.keras has different implementations for embedding layer. To validate that, let's go to keras embedding layer.

https://github.com/keras-team/keras/blob/master/keras/layers/embeddings.py#L16

For now let's just look into call function.

    def call(self, inputs):
        if K.dtype(inputs) != 'int32':
            inputs = K.cast(inputs, 'int32')
        out = K.gather(self.embeddings, inputs)
        return out

N.B: If you want the exact source code for keras 2.3.1 go here and download source code: https://github.com/keras-team/keras/releases

But if we go to tensorflow implementation, it's different.

https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/ops/embedding_ops.py

Just to verify, the call function is differently written.

  def call(self, inputs):
    dtype = K.dtype(inputs)
    if dtype != 'int32' and dtype != 'int64':
      inputs = math_ops.cast(inputs, 'int32')
    out = embedding_ops.embedding_lookup(self.embeddings, inputs)
    return out

Now, we can dig deeper to find the different behavior and pinpoint the source for which keras doesn't throw the error and tensorflow.keras does but let's make a simple point. Is keras embedding layer doing something wrong?

Let's design a simple network like before and observe the weight matrix.

from keras.models import Model
from keras.layers import Embedding, Input
import numpy as np

ip = Input(shape = (3,))
emb = Embedding(1, 2, trainable=True, mask_zero=True)(ip)

model = Model(ip, emb)
input_array = np.array([[5, 3, 1], [1, 2, 3]])

model.compile("rmsprop", "mse")

output_array = model.predict(input_array)

print(output_array)

print(output_array.shape)

model.summary()

The model gives the following output.

[[[0. 0.]
  [0. 0.]
  [0. 0.]]

 [[0. 0.]
  [0. 0.]
  [0. 0.]]]
(2, 3, 2)
Model: "model_18"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_21 (InputLayer)        (None, 3)                 0         
_________________________________________________________________
embedding_33 (Embedding)     (None, 3, 2)              2         
=================================================================
Total params: 2
Trainable params: 2
Non-trainable params: 0

Okay, we are getting bunch of zeros but the default weight_initializer is not zeros!

So, let's observe the weight matrix now.

import keras.backend as K

w = model.layers[1].get_weights()
print(w)

[array([[ 0.03680499, -0.04904002]], dtype=float32)]

In fact, it is not all zeros.

So, why are we getting zeros?

Let's change our input to the model.

As the only in vocabulary word index for input_dim = 1, is 0. Let's pass 0 as one of the inputs.

from keras.models import Model
from keras.layers import Embedding, Input
import numpy as np

ip = Input(shape = (3,))
emb = Embedding(1, 2, trainable=True, mask_zero=True)(ip)

model = Model(ip, emb)
input_array = np.array([[5, 0, 1], [1, 2, 0]])

model.compile("rmsprop", "mse")

output_array = model.predict(input_array)

print(output_array)

print(output_array.shape)

model.summary()

Now, we get non-zero vectors for the positions where we passed 0.

[[[ 0.          0.        ]
  [-0.04339869 -0.04900574]
  [ 0.          0.        ]]

 [[ 0.          0.        ]
  [ 0.          0.        ]
  [-0.04339869 -0.04900574]]]
(2, 3, 2)
Model: "model_19"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_22 (InputLayer)        (None, 3)                 0         
_________________________________________________________________
embedding_34 (Embedding)     (None, 3, 2)              2         
=================================================================
Total params: 2
Trainable params: 2
Non-trainable params: 0

In short, Keras maps any out of vocabulary word index with a zero vector and this is reasonable as for those positions the forward pass will ensure all the contributions are NIL (the biases may have a role though). That is a little bit counter-intuitive as passing out of vocabulary tokens to the model seems an overhead (rather than just removing them in the pre-processing step) and bad practice.

The lesson would be avoiding Keras altogether and shifting to tensorflow.keras as they clearly mention there will be less support and minor bug fixes after 2.2 versions.

The relevant issue at keras github repo: https://github.com/keras-team/keras/issues/13989