I am reading the "Deep Learning for Coders with fastai & PyTorch" book. I'm still a bit confused as to what the Embedding module does. It seems like a short and simple network, except I can't seem to wrap my head around what Embedding does differently than Linear without a bias. I know it does some faster computational version of a dot product where one of the matrices is a one-hot encoded matrix and the other is the embedding matrix. It does this to in effect select a piece of data? Please point out where I am wrong. Here is one of the simple networks shown in the book.
class DotProduct(Module):
def __init__(self, n_users, n_movies, n_factors):
self.user_factors = Embedding(n_users, n_factors)
self.movie_factors = Embedding(n_movies, n_factors)
def forward(self, x):
users = self.user_factors(x[:,0])
movies = self.movie_factors(x[:,1])
return (users * movies).sum(dim=1)
[...] what Embedding does differently than Linear without a bias.
Essentially everything. torch.nn.Embedding
is a lookup table; it works the same as torch.Tensor
but with a few twists (like possibility to use sparse embedding or default value at specified index).
For example:
import torch
embedding = torch.nn.Embedding(3, 4)
print(embedding.weight)
print(embedding(torch.tensor([1])))
Would output:
Parameter containing:
tensor([[ 0.1420, -0.1886, 0.6524, 0.3079],
[ 0.2620, 0.4661, 0.7936, -1.6946],
[ 0.0931, 0.3512, 0.3210, -0.5828]], requires_grad=True)
tensor([[ 0.2620, 0.4661, 0.7936, -1.6946]], grad_fn=<EmbeddingBackward>)
So we took the first row of the embedding. It does nothing more than that.
Usually when we want to encode some meaning (like word2vec) for each row (e.g. words being close semantically are close in euclidean space) and possibly train them.
torch.nn.Linear
(without bias) is also a torch.Tensor
(weight) but it does operation on it (and the input) which is essentially:
output = input.matmul(weight.t())
every time you call the layer (see source code and functional definition of this layer).
The layer in your code snippet does this:
__init__
(batch_size, 2)
:
(batch_size,)
(so it's different from nn.Linear
which would return (batch_size, out_features)
and perform dot product instead of element-wise multiplication followed by summation like here)This is probably used to train both representations (of users and movies) for some recommender-like system.
I know it does some faster computational version of a dot product where one of the matrices is a one-hot encoded matrix and the other is the embedding matrix.
No, it doesn't. torch.nn.Embedding
can be one hot encoded and might also be sparse, but depending on the algorithms (and whether those support sparsity) there might be performance boost or not.