How to implement word2vec CBOW in keras with shared Embedding layer and negative sampling?

I want to create a word embedding pretraining network which adds something on top of word2vec CBOW. Therefore, I'm trying to implement word2vec CBOW first. Since I'm very new to keras, I'm unable to figure out how to implement CBOW in it.

Initialization:

I have calculated the vocabulary and have the mapping of word to integers.

Input to the (yet to be implemented) network:

A list of 2*k + 1 integers (representing the central word and 2*k words in context)

Network Specification

A shared Embedding layer should take this list of integers and give their corresponding vector outputs. Further a mean of 2*k context vector is to be taken (I believe this can be done using add_node(layer, name, inputs=[2*k vectors], merge_mode='ave')).

It will be very helpful if anyone can share a small code-snippet of this.

P.S.: I was looking at word2veckeras, but couldn't follow its code because it also uses a gensim.

UPDATE 1:

I want to share the embedding layer in the network. The embedding layer should be able to take context words (2*k) and the current word as well. I can do this by taking all 2*k + 1 word indices in the input and write a custom lambda function which will do the needful. But, after that I also want to add negative sampling network for which I'll have to take embedding of more words and dot product with the context vector. Can someone provide with an example where Embedding layer is a shared node in the Graph() network

Solution

Graph() has been deprecated from keras

Any arbitrary network can be created by using keras functional API. Following is the demo code which created a word2vec cbow model with negative sampling tested on randomized inputs

from keras import backend as K
import numpy as np
from keras.utils.np_utils import accuracy
from keras.models import Sequential, Model
from keras.layers import Input, Lambda, Dense, merge
from keras.layers.embeddings import Embedding

k = 3 # context windows size
context_size = 2*k
neg = 5 # number of negative samples
# generate weight matrix for embeddings
embedding = []
for i in range(10):
    embedding.append(np.full(100, i))
embedding = np.array(embedding)
print embedding

# Creating CBOW model
word_index = Input(shape=(1,))
context = Input(shape=(context_size,))
negative_samples = Input(shape=(neg,))
shared_embedding_layer = Embedding(input_dim=10, output_dim=100, weights=[embedding])

word_embedding = shared_embedding_layer(word_index)
context_embeddings = shared_embedding_layer(context)
negative_words_embedding = shared_embedding_layer(negative_samples)
cbow = Lambda(lambda x: K.mean(x, axis=1), output_shape=(100,))(context_embeddings)

word_context_product = merge([word_embedding, cbow], mode='dot')
negative_context_product = merge([negative_words_embedding, cbow], mode='dot', concat_axis=-1)

model = Model(input=[word_index, context, negative_samples], output=[word_context_product, negative_context_product])

model.compile(optimizer='rmsprop', loss='mse', metrics=['accuracy'])

input_context = np.random.randint(10, size=(1, context_size))
input_word = np.random.randint(10, size=(1,))
input_negative = np.random.randint(10, size=(1, neg))

print "word, context, negative samples"
print input_word.shape, input_word
print input_context.shape, input_context
print input_negative.shape, input_negative

output_dot_product, output_negative_product = model.predict([input_word, input_context, input_negative])
print "word cbow dot product"
print output_dot_product.shape, output_dot_product
print "cbow negative dot product"
print output_negative_product.shape, output_negative_product

Hope it helps!

UPDATE 1:

I've completed the code and uploaded it here