Search code examples
kerasembeddingword2vec

How to implement word2vec CBOW in keras with shared Embedding layer and negative sampling?


I want to create a word embedding pretraining network which adds something on top of word2vec CBOW. Therefore, I'm trying to implement word2vec CBOW first. Since I'm very new to keras, I'm unable to figure out how to implement CBOW in it.

Initialization:

I have calculated the vocabulary and have the mapping of word to integers.

Input to the (yet to be implemented) network:

A list of 2*k + 1 integers (representing the central word and 2*k words in context)

Network Specification

A shared Embedding layer should take this list of integers and give their corresponding vector outputs. Further a mean of 2*k context vector is to be taken (I believe this can be done using add_node(layer, name, inputs=[2*k vectors], merge_mode='ave')).

It will be very helpful if anyone can share a small code-snippet of this.

P.S.: I was looking at word2veckeras, but couldn't follow its code because it also uses a gensim.

UPDATE 1:

I want to share the embedding layer in the network. The embedding layer should be able to take context words (2*k) and the current word as well. I can do this by taking all 2*k + 1 word indices in the input and write a custom lambda function which will do the needful. But, after that I also want to add negative sampling network for which I'll have to take embedding of more words and dot product with the context vector. Can someone provide with an example where Embedding layer is a shared node in the Graph() network


Solution

  • Graph() has been deprecated from keras

    Any arbitrary network can be created by using keras functional API. Following is the demo code which created a word2vec cbow model with negative sampling tested on randomized inputs

    from keras import backend as K
    import numpy as np
    from keras.utils.np_utils import accuracy
    from keras.models import Sequential, Model
    from keras.layers import Input, Lambda, Dense, merge
    from keras.layers.embeddings import Embedding
    
    k = 3 # context windows size
    context_size = 2*k
    neg = 5 # number of negative samples
    # generate weight matrix for embeddings
    embedding = []
    for i in range(10):
        embedding.append(np.full(100, i))
    embedding = np.array(embedding)
    print embedding
    
    # Creating CBOW model
    word_index = Input(shape=(1,))
    context = Input(shape=(context_size,))
    negative_samples = Input(shape=(neg,))
    shared_embedding_layer = Embedding(input_dim=10, output_dim=100, weights=[embedding])
    
    word_embedding = shared_embedding_layer(word_index)
    context_embeddings = shared_embedding_layer(context)
    negative_words_embedding = shared_embedding_layer(negative_samples)
    cbow = Lambda(lambda x: K.mean(x, axis=1), output_shape=(100,))(context_embeddings)
    
    word_context_product = merge([word_embedding, cbow], mode='dot')
    negative_context_product = merge([negative_words_embedding, cbow], mode='dot', concat_axis=-1)
    
    model = Model(input=[word_index, context, negative_samples], output=[word_context_product, negative_context_product])
    
    model.compile(optimizer='rmsprop', loss='mse', metrics=['accuracy'])
    
    input_context = np.random.randint(10, size=(1, context_size))
    input_word = np.random.randint(10, size=(1,))
    input_negative = np.random.randint(10, size=(1, neg))
    
    print "word, context, negative samples"
    print input_word.shape, input_word
    print input_context.shape, input_context
    print input_negative.shape, input_negative
    
    output_dot_product, output_negative_product = model.predict([input_word, input_context, input_negative])
    print "word cbow dot product"
    print output_dot_product.shape, output_dot_product
    print "cbow negative dot product"
    print output_negative_product.shape, output_negative_product
    

    Hope it helps!

    UPDATE 1:

    I've completed the code and uploaded it here