Search code examples
tensorflowinputdeep-learningembedding

Difference between feature_column.embedding_column and keras.layers.Embedding in TensorFlow


I have been using keras.layers.Embedding for almost all of my projects. But, recently I wanted to fiddle around with tf.data and found feature_column.embedding_column.

From the documentation:

feature_column.embedding_column - DenseColumn that converts from sparse, categorical input. Use this when your inputs are sparse, but you want to convert them to a dense representation (e.g., to feed to a DNN).

keras.layers.Embedding - Turns positive integers (indexes) into dense vectors of fixed size. e.g. [[4], [20]] -> [[0.25, 0.1], [0.6, -0.2]] This layer can only be used as the first layer in a model.

My question is, is both of the api doing similar thing on different type of input data(for ex. input - [0,1,2] for keras.layers.Embedding and its one-hot-encoded rep. [[1,0,0],[0,1,0],[0,0,1] for feature_column.embedding_column)?


Solution

  • After reviewing source code for both operations here is what I found:

    • both operations rely on tensorflow.python.ops.embedding_ops funcitonality;
    • keras.layers.Embedding uses dense representations and contains generic keras code for fiddling with shapes, init variables etc;
    • feature_column.embedding_column relies on sparse and contains functionality to cache results.

    So, your guess seems to be right: these 2 are doing similar things, rely on distinct input representations, contain some logic that doesn't change the essense of what they do.