I recently read a paper about embedding.
In Eq. (3), the f
is a 4096X1
vector. the author try to compress the vector in to theta
(a 20X1
vector) by using an embedding matrix E
.
The equation is simple theta = E*f
I was wondering if it can using pytorch
to achieve this goal, then in the training, the E
can be learned automatically.
How to finish the rest? thanks so much.
The demo code is follow:
import torch
from torch import nn
f = torch.randn(4096,1)
Assuming your input vectors are one-hot that is where "embedding layers" are used, you can directly use embedding layer from torch which does above as well as some more things. nn.Embeddings
take non-zero index of one-hot vector as input as a long tensor. For ex: if feature vector is
f = [[0,0,1], [1,0,0]]
then input to nn.Embeddings
will be
input = [2, 0]
However, what OP has asked in question is getting embeddings by matrix multiplication and below I will address that. You can define a module to do that as below. Since, param is an instance of nn.Parameter
it will be registered as a parameter and will be optimized when you call Adam or any other optimizer.
class Embedding(nn.Module):
def __init__(self, input_dim, embedding_dim):
super().__init__()
self.param = torch.nn.Parameter(torch.randn(input_dim, embedding_dim))
def forward(self, x):
return torch.mm(x, self.param)
If you notice carefully this is the same as a linear layer with no bias and slightly different initialization. Therefore, you can achieve the same by using a linear layer as below.
self.embedding = nn.Linear(4096, 20, bias=False)
# change initial weights to normal[0,1] or whatever is required
embedding.weight.data = torch.randn_like(embedding.weight)