Search code examples
pythontensorflowtensorflow-estimator

Feature Column Pre-trained Embedding


How to use pre-trained embedding with tf.feature_column.embedding_column.

I used pre_trained embedding in tf.feature_column.embedding_column. But it doesn't work. Error is

The error is :

ValueError: initializer must be callable if specified. Embedding of column_name: itemx

Here's my code:

weight, vocab_size, emb_size = _create_pretrained_emb_from_txt(FLAGS.vocab, 
FLAGS.pre_emb)

W = tf.Variable(tf.constant(0.0, shape=[vocab_size, emb_size]),
                trainable=False, name="W")
embedding_placeholder = tf.placeholder(tf.float32, [vocab_size, emb_size])
embedding_init = W.assign(embedding_placeholder)

sess = tf.Session()
sess.run(embedding_init, feed_dict={embedding_placeholder: weight})

itemx_vocab = tf.feature_column.categorical_column_with_vocabulary_file(
    key='itemx',
    vocabulary_file=FLAGS.vocabx)

itemx_emb = tf.feature_column.embedding_column(itemx_vocab,
                                               dimension=emb_size,
                                               initializer=W,
                                               trainable=False)

I have tried initializer = lambda w:W. like this:

itemx_emb = tf.feature_column.embedding_column(itemx_vocab,
                                               dimension=emb_size,
                                               initializer=lambda w:W,
                                               trainable=False)

it reports the error:

TypeError: () got an unexpected keyword argument 'dtype'


Solution

  • I also take a issue here https://github.com/tensorflow/tensorflow/issues/20663

    finally I got a right way with to solve it. although. i'm not clear why answer above is not effective!! if you know the question, Thanks to give some suggestion to me!!

    ok~~~~here is current solvement. Actually from here Feature Columns Embedding lookup

    code:

    itemx_vocab = tf.feature_column.categorical_column_with_vocabulary_file(
        key='itemx',
        vocabulary_file=FLAGS.vocabx)
    
    embedding_initializer_x = tf.contrib.framework.load_embedding_initializer(
        ckpt_path='model.ckpt',
        embedding_tensor_name='w_in',
        new_vocab_size=itemx_vocab.vocabulary_size,
        embedding_dim=emb_size,
        old_vocab_file='FLAGS.vocab_emb',
        new_vocab_file=FLAGS.vocabx
    )
    itemx_emb = tf.feature_column.embedding_column(itemx_vocab,
                                                   dimension=128,
                                                   initializer=embedding_initializer_x,
                                                   trainable=False)