When I played tensorflow tutorial, one embedding trick is used in Wide and Deep tutorial like this.
The tutorial shows how transfer sparse feature (usually one hot encoding) to embedding vector. I knew there are some approaches to create this embedding, such as word embedding, PCA or t-SNE or matrix factorization. But in this tutorial, they did not show how to create an embedding for the sparse vector. Or did the tutorial just use neural network to finish the embedding?
If you know word embeddings, this transformation should be familiar to you. From "The Deep Model: Neural Network with Embeddings" section:
The embedding values are initialized randomly, and are trained along with all other model parameters to minimize the training loss.
Essentially, what tf.feature_column.embedding_column(occupation, dimension=8)
does is that it creates a [N, 8]
matrix, where N
is the number of occupation
values or the number of buckets if you use hashed categorical columns. Each input occupation
value acts like an index to select an embedding vector of size [8]
. The rest of the network is going to work with this [8]
vector, without knowing what N
is. This vector is often called dense to emphasize the difference between one-hot encoding of length N
, most values of which are zeros, and the [8]
vector, all values of which matter.
The embeddings are trainable, so after random initialization, they will drift to some values that turn out to be useful for the rest of the network. This is very similar to word2vec or other word embeddings, and is often pretty effective representation:
Through dense embeddings, deep models can generalize better and make predictions on feature pairs that were previously unseen in the training data.