Search code examples
pythonmachine-learningkerasnlpembedding

Can a matrix be given as input to Keras's embedding layer?


I am using Keras to capture semantic information for a dataset. And I already tokenize the data to integer vectors. It has a form like this:

texts=[[1,2,3,2,1],
       [2,3,4,2,2],
       [3,33,2,1,3]]

labels=[1,0,1]

And the labels only contains 0 or 1, each list contain one label. I want to use Keras's embedding layer to embed this. But the examples on the Internet only contain a list:

texts=[1,2,3,4,2,1]

I am wondering can I input a matrix to the embedding layer?


Solution

  • Each list in the texts list is a training sample and there is a corresponding label for each of them in the labels list. Therefore, each training sample is just a vector of integers (i.e. word indices) which you can easily feed to an Embedding layer:

    inp = Input(shape=(num_words_per_sample,))
    x = Embedding(vocab_size, emb_dim)(inp)
    

    Note that you may need to convert the training data and labels to a numpy array (if it is not already):

    import numpy as np
    texts = np.array(texts)
    print(texts.shape)  # (3,5)  <--- three samples each containing 5 words
    
    labels = np.array(labels)
    print(labels.shape) # (3,)   <--- three labels, one for each sample