Search code examples
kerasword2vecword-embedding

How to use Keras Embedding layer when there are more than 1 text features


I understand how to use the Keras Embedding layer in case there is a single text feature like in IMDB review classification. However, I am confused how to use the Embedding Layers when I have a Classification problem, where there are more than a single text feature. For example, I have a dataset with 2 text features Diagnosis Text, and Requested Procedure and the label is binary class (1 for approved, 0 for not approved). In the example below, x_train has 2 columns Diagnosis and Procedure, unlike the IMDB dataset. Do I need to create 2 Embedding layers, one for Diagnosis, and Procedure? If so, what code changes would be required?

x_train = preprocessing.sequences.pad_sequences(x_train, maxlen=20)
x_test = preprocessing.sequences.pad_sequences(x_test, maxlen=20)
model = Sequential()
model.add(Embedding(10000,8,input_length=20)
model.add(Flatten())
model.add(Dense(1, activation='sigmoid')
model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['acc'])
model.fit(x_train, y_train, epochs=10, batch_size=32, validation_split=0.2)

Solution

  • You have some choices, you could concatenate the the two features into one and create a single embedding for both of them. Here is the logic

    all_features = np.hstack(X['diag'] + X['proc'])
    X = pad_sequence(all_features, max_len)
    # build model as usual, as you can see on a single embedding layer is
    # needed.
    

    or you can use the Functional api and build multiple input model

    diag_inp = Input()
    diag_emb = Embedding(512)(diag_input)
    proc_inp = Input()
    proc_emb = Embedding(512)(proc_input)
    
    # concatenate them to makes a single vector per sample
    merged = Concatenate()[diag_emb, proc_emb]
    out = Dense(2,  activation='sigmoid')(merged)
    model = Model(inputs=[diag_inp, proc_inp], outputs=[out])
    

    That is you can learn an embedding for the concatenation or you can learn multiple embeddings and concatenate them while training.