Search code examples
nlpword2vec

What are some common ways to get sentence vector from corresponding word vectors?


I have successfully implemented word2vec model to generate word embedding or word vectors now I require to generate sentence vectors from generated word vectors so that I can feed a neural network to summarize a text corpus. What are the common approaches to generate sentence vectors from word vectors?


Solution

  • You can try adding an LSTM/RNN encoder before your actual Neural Network and feed your neural net using hidden states of your encoder( which would act as document representations).

    Benefit of doing this is your document embeddings will be trained for your specific task of text summarization.

    I don't know what framework you are using otherwise would have helped you with some code to get you started.

    EDIT 1: Add code snippet

    word_in = Input(shape=("<MAX SEQ LEN>",))
    
    emb_word = Embedding(input_dim="<vocab size>", output_dim="<embd_dim>",input_length="<MAX SEQ LEN>", mask_zero=True)(word_in)
    
    lstm = LSTM(units="<size>", return_sequences=False,
                                    recurrent_dropout=0.5, name="lstm_1")(emb_word)
    

    Add any type of dense layer which takes vectors as inputs.

    LSTM takes input of shape batch_size * sequence_length * word_vector_dimension and produces output of shape batch_size * rnn_size; which you can use as document embeddings.