python tensorflow keras nlp conv-neural-network

How to extract activations from dense layer

I am trying to implement the preprocessing code for this paper (code in this repo). The preprocessing code is described in the paper here:

"A convolutional neural network (Kim, 2014) is used to extract textual features from the transcript of the utterances. We use a single convolutional layer followed by max-pooling and a fully connected layer to obtain the feature representations for the utterances. The input to this network is the 300 dimensional pretrained 840B GloVe vectors (Pennington et al., 2014). We use filters of size 3, 4 and 5 with 50 feature maps in each. The convoluted features are then max-pooled with a window size of 2 followed by the ReLU activation (Nair and Hinton, 2010). These are then concatenated and fed to a 100 dimensional fully connected layer, whose activations form the representation of the utterance. This network is trained at utterance level with the emotion labels."

The authors of the paper state that CNN feature extraction code can be found in this repo. However, this code is for a complete model that does sequence classification. It does everything in the quote above except the bolded part (and it goes further to complete do classification). I want the edit the code to build that concatenates and feeds into the 100d layer and then extracts the activations. The data to train on is found in the repo (its the IMDB dataset).

The output should be a (100, ) tensor for each sequence.

Here's the code for the CNN model:

import tensorflow as tf
import numpy as np


class TextCNN(object):
    """
    A CNN for text classification.
    Uses an embedding layer, followed by a convolutional, max-pooling and softmax layer.
    """
    def __init__(
      self, sequence_length, num_classes, vocab_size,
      embedding_size, filter_sizes, num_filters, l2_reg_lambda=0.0):

        # Placeholders for input, output and dropout
        self.input_x = tf.placeholder(tf.int32, [None, sequence_length], name="input_x")
        self.input_y = tf.placeholder(tf.float32, [None, num_classes], name="input_y")
        self.dropout_keep_prob = tf.placeholder(tf.float32, name="dropout_keep_prob")

        # Keeping track of l2 regularization loss (optional)
        l2_loss = tf.constant(0.0)

        # Embedding layer
        with tf.device('/cpu:0'), tf.name_scope("embedding"):
            self.W = tf.Variable(
                tf.random_uniform([vocab_size, embedding_size], -1.0, 1.0),
                name="W")
            self.embedded_chars = tf.nn.embedding_lookup(self.W, self.input_x)
            self.embedded_chars_expanded = tf.expand_dims(self.embedded_chars, -1)

        # Create a convolution + maxpool layer for each filter size
        pooled_outputs = []
        for i, filter_size in enumerate(filter_sizes):
            with tf.name_scope("conv-maxpool-%s" % filter_size):
                # Convolution Layer
                filter_shape = [filter_size, embedding_size, 1, num_filters]
                W = tf.Variable(tf.truncated_normal(filter_shape, stddev=0.1), name="W")
                b = tf.Variable(tf.constant(0.1, shape=[num_filters]), name="b")
                conv = tf.nn.conv2d(
                    self.embedded_chars_expanded,
                    W,
                    strides=[1, 1, 1, 1],
                    padding="VALID",
                    name="conv")
                # Apply nonlinearity
                h = tf.nn.relu(tf.nn.bias_add(conv, b), name="relu")
                # Maxpooling over the outputs
                pooled = tf.nn.max_pool(
                    h,
                    ksize=[1, sequence_length - filter_size + 1, 1, 1],
                    strides=[1, 1, 1, 1],
                    padding='VALID',
                    name="pool")
                pooled_outputs.append(pooled)

        # Combine all the pooled features
        num_filters_total = num_filters * len(filter_sizes)
        self.h_pool = tf.concat(pooled_outputs, 3)
        self.h_pool_flat = tf.reshape(self.h_pool, [-1, num_filters_total])

        # Add dropout
        with tf.name_scope("dropout"):
            self.h_drop = tf.nn.dropout(self.h_pool_flat, self.dropout_keep_prob)

        # Final (unnormalized) scores and predictions
        with tf.name_scope("output"):
            W = tf.get_variable(
                "W",
                shape=[num_filters_total, num_classes],
                initializer=tf.contrib.layers.xavier_initializer())
            b = tf.Variable(tf.constant(0.1, shape=[num_classes]), name="b")
            l2_loss += tf.nn.l2_loss(W)
            l2_loss += tf.nn.l2_loss(b)
            self.scores = tf.nn.xw_plus_b(self.h_drop, W, b, name="scores")
            self.predictions = tf.argmax(self.scores, 1, name="predictions")

        # Calculate mean cross-entropy loss
        with tf.name_scope("loss"):
            losses = tf.nn.softmax_cross_entropy_with_logits(logits=self.scores, labels=self.input_y)
            self.loss = tf.reduce_mean(losses) + l2_reg_lambda * l2_loss

        # Accuracy
        with tf.name_scope("accuracy"):
            correct_predictions = tf.equal(self.predictions, tf.argmax(self.input_y, 1))
            self.accuracy = tf.reduce_mean(tf.cast(correct_predictions, "float"), name="accuracy")

I want to do the concatenation into the 100d layer to get the activations, I think around line 59 (right before the # Add Dropout section near the bottom, and then comment out the rest below it). How do I do this?

Solution

The convolutional neural network you are trying to implement is a great baseline in the NLP domain. It was introduced for the first time in this paper (Kim, 2014).

I found very useful the code you report but may be more complex than we need. I try to rewrite the network in simple keras (I only miss regularizations)

def TextCNN(sequence_length, num_classes, vocab_size, 
            embedding_size, filter_sizes, num_filters, 
            embedding_matrix):

    sequence_input = Input(shape=(sequence_length,), dtype='int32')

    embedding_layer = Embedding(vocab_size,
                                embedding_size,
                                weights=[embedding_matrix],
                                input_length=sequence_length,
                                trainable=False)

    embedded_sequences = embedding_layer(sequence_input)

    convs = []
    for fsz in filter_sizes:
        x = Conv1D(num_filters, fsz, activation='relu', padding='same')(embedded_sequences)
        x = MaxPooling1D(pool_size=2)(x)
        convs.append(x)

    x = Concatenate(axis=-1)(convs)
    x = Flatten()(x)
    x = Dropout(0.5)(x)
    output = Dense(num_classes, activation='softmax')(x)

    model = Model(sequence_input, output)
    model.compile(loss='categorical_crossentropy',
                  optimizer='adam',
                  metrics=['accuracy'])

    return model

The initial embedding is set with weights learned in GLOVE. you can upload them or learn new embedding representation with other techniques (Word2Vec or FastText) and upload them. The fit is computed as always

I underline that the above is the original representation of the network. If you would like to insert a 100 dense layer before the output it can be simply modified in this way (here a code reference):

def TextCNN(sequence_length, num_classes, vocab_size, 
            embedding_size, filter_sizes, num_filters, 
            embedding_matrix):

    sequence_input = Input(shape=(sequence_length,), dtype='int32')

    embedding_layer = Embedding(vocab_size,
                                embedding_size,
                                weights=[embedding_matrix],
                                input_length=sequence_length,
                                trainable=False)

    embedded_sequences = embedding_layer(sequence_input)

    convs = []
    for fsz in filter_sizes:
        x = Conv1D(num_filters, fsz, activation='relu', padding='same')(embedded_sequences)
        x = MaxPooling1D(pool_size=2)(x)
        convs.append(x)

    x = Concatenate(axis=-1)(convs)
    x = Flatten()(x)
    x = Dense(100, activation='relu', name='extractor')(x)
    x = Dropout(0.5)(x)
    output = Dense(num_classes, activation='softmax')(x)

    model = Model(sequence_input, output)
    model.compile(loss='categorical_crossentropy',
                  optimizer='adam',
                  metrics=['accuracy'])

    return model

model = TextCNN(sequence_length=50, num_classes=10, vocab_size=3333, 
        embedding_size=100, filter_sizes=[3,4,5], num_filters=50, 
        embedding_matrix)

model.fit(....)

To extract the features of our interest we need the output of our Dense100 (that we named 'extractor'). I suggest also this tutorial for filter and feature extraction.

extractor = Model(model.input, model.get_layer('extractor').output)
representation = extractor.predict(np.random.randint(0,200, (1000,50)))

the representation will be an array of shape (n_sample, 100)