Search code examples
pythonpython-3.xpytorchword-embedding

Word embeddings with multiple categorial features for a single word


I'm looking for a method to implement word embedding network with LSTM layers in Pytorch such that the input to the nn.Embedding layer has a different form than vectors of words IDs.

Each word in my case has a corresponding vector and the sentence in my corpus is consequently a vector of vectors. So, for example, I may have the word "King" with vector [500, 3, 18] where 500 is the Word ID, 3 is the word color, and 18 is the font size, etc. The embedding layer role here is to do some automatic feature reduction/extraction.

How can I feed the embedding layer with such form data? Or do you have any better suggestions?


Solution

  • I am not sure what do you mean by word2vec algorithm with LSTM because the original word2vec algorithm does not use LSTMs and uses directly embeddings to predict surrounding words.

    Anyway, it seems you have multiple categorical variables to embed. In the example, it is word ID, color ID, and font size (if you round it to integer values). You have two option:

    1. You can create new IDs for all possible combinations of your features and use nn.Embedding for them. There is however a risk that most of the IDs will appear too sparsely in the data to learn reliable embeddings.

    2. Have separate embedding for each of the features. Then, you will need to combine the embeddings for the features together. You have basically three options how to do it:

      • Just concatenate the embeddings and let the following layers of the network to resolve the combination.
      • Choose the same embedding dimension for all features and average them. (I would start with this one probably.)
      • Add a nn.Dense layer (or two, the first one with ReLU activation and the second without activation) that will explicitly combine the embeddings for your features.

    If you need to include continuous features that cannot be discretized, you can always take the continuous features, apply a layer or two on top of them and combine them with the embeddings of the discrete features.