Search code examples
pythontensorflowword2vecword-embedding

How does the tensorflow word2vec tutorial update embeddings?


This thread comes close: What is the purpose of weights and biases in tensorflow word2vec example?

But I am still missing something from my interpretation of this: https://github.com/tensorflow/tensorflow/blob/r1.2/tensorflow/examples/tutorials/word2vec/word2vec_basic.py

From what I understand, you feed the network the indices of target and context words from your dictionary.

_, loss_val = session.run([optimizer, loss], feed_dict=feed_dict)
average_loss += loss_val

The batch inputs are then looked up to return the vectors that are randomly generated at the beginning

    embeddings = tf.Variable(
    tf.random_uniform([vocabulary_size, embedding_size], -1.0, 1.0))
    # Look up embeddings for inputs.
    embed = tf.nn.embedding_lookup(embeddings, train_inputs)

Then an optimizer adjusts the weights and biases to best predict the label as opposed to num_sampled random alternatives

 loss = tf.reduce_mean(
  tf.nn.nce_loss(weights=nce_weights,
                 biases=nce_biases,
                 labels=train_labels,
                 inputs=embed,
                 num_sampled=num_sampled,
                 num_classes=vocabulary_size))

  # Construct the SGD optimizer using a learning rate of 1.0.
  optimizer = tf.train.GradientDescentOptimizer(1.0).minimize(loss)

My questions are as follows:

  1. Where do the embeddings variable get updated?. It appears to me that I could get the final result by either running the index of a word through the neural network, or by just taking the final_embeddings vectors and using that. But I do not understand where embeddings is ever changed from its random initialization.

  2. If I were to draw this computation graph, what would it look like (or better yet, what is the best way to actually do so)?

  3. Is this running all of the context/target pairs in the batch at once? Or one by one?


Solution

  • Embeddings: Embeddings is a variable. It gets updated every time you do backprop (while running optimizer with loss)

    Grpah: Did you try saving the graph and displaying it in tensorboard ? Is this what you're looking for ?

    Batching: Atleast in the example you linked, he is doing batch processing using the function at line 96. https://github.com/tensorflow/tensorflow/blob/r1.2/tensorflow/examples/tutorials/word2vec/word2vec_basic.py#L96

    Please correct me if I misunderstood your question.