I am using Keras to capture semantic information for a dataset. And I already tokenize the data to integer vectors. It has a form like this:
texts=[[1,2,3,2,1],
[2,3,4,2,2],
[3,33,2,1,3]]
labels=[1,0,1]
And the labels only contains 0 or 1, each list contain one label. I want to use Keras's embedding layer to embed this. But the examples on the Internet only contain a list:
texts=[1,2,3,4,2,1]
I am wondering can I input a matrix to the embedding layer?
Each list in the texts
list is a training sample and there is a corresponding label for each of them in the labels
list. Therefore, each training sample is just a vector of integers (i.e. word indices) which you can easily feed to an Embedding layer:
inp = Input(shape=(num_words_per_sample,))
x = Embedding(vocab_size, emb_dim)(inp)
Note that you may need to convert the training data and labels to a numpy array (if it is not already):
import numpy as np
texts = np.array(texts)
print(texts.shape) # (3,5) <--- three samples each containing 5 words
labels = np.array(labels)
print(labels.shape) # (3,) <--- three labels, one for each sample