machine-learning keras nlp word-embedding

What does the embedding layer for a network looks like?

I just start with text classification, and I got stuck in the embedding layer. If I have a batch of sequences encoded as integer corresponding to each word, what does the embedding layer looks like? Is there neurons like normal neural layer?

I've seen the keras.layers.Embedding, but after looking for the document I'm really confused about how does it works. I can understand input_dim, but why is output_dim a 2D matrix? How many weights do I have in this embedding layer?

I'm sorry if my question is not explained clearly, I've no experience in NLP, if this problem about word embedding is common basics in NLP, please tell me and I will check for it.

Solution

Embedding layer is just a trainable look-up table: it takes as input an integer index and returns as output the word embedding associated with that index:

index |                            word embeddings
=============================================================================
  0   |  word embedding for the word with index 0 (usually used for padding)
-----------------------------------------------------------------------------
  1   |  word embedding for the word with index 1
-----------------------------------------------------------------------------
  2   |  word embedding for the word with index 2
-----------------------------------------------------------------------------
  .   |
  .   |
  .   |
-----------------------------------------------------------------------------
  N   |  word embedding for the word with index N
-----------------------------------------------------------------------------

It is trainable in that sense the embeddings values are not necessarily fixed and could be changed during training. The input_dim argument is actually the number of words (or more generally the number of distinct elements in the sequences). The output_dim argument specifies the dimension of each word embedding. For example in case of using output_dim=100 each word embedding would be a vector of size 100. Further, since the input of an embedding layer is a sequence of integers (corresponding to the words in a sentence) therefore its output would have a shape of (num_sequences, len_sequence, output_dim), i.e. for each integer in a sequence an embedding vector of size output_dim is returned.

As for the number of weights in an embedding layer it is very easy to calculate: there are input_dim unique indices and each index is associated with a word embedding of size output_dim. Therefore the number of weights in an embedding layer is input_dim x ouput_dim.