python tensorflow keras nlp conv-neural-network

Keras 1D segmentation model always classifies even number of items

I'm trying to train a 1D CNN to identify specific parts of a text string.

The inputs are arrays of shape (128,1) containing 128 characters, and the aim is for the network to classify each of the characters into a particular class. For purposes of illustration, an input array could look like this:

array(['3', '!', 'd', 'o', 'g', '.', '?', '8', '7', 'a', 'p', 'p', 'l',
       'e', 'f', 'd', '$', '5'], dtype='<U1')

And the corresponding label looks like this:

array([0, 0, 1, 1, 1, 0, 0, 0, 0, 2, 2, 2, 2, 2, 0, 0, 0, 0])

The idea being that the network will classify the characters "d", "o", "g" into class 1 (say, animals) and "a", "p", "p", "l", "e" into class 2 (fruits) and the rest into class 0.

The plan is to build a network with an architecture similar to U-Net, but for now I'm experimenting with a very simple downsample/upsample network which looks like this:

def get_model(seq_size,n_classes):
    
    inputs = tf.keras.Input(shape=seq_size)
    
#     Downsample phase

    x = tf.keras.layers.Conv1D(32,11,padding="same")(inputs)
    x = tf.keras.layers.BatchNormalization()(x)
    x = tf.keras.layers.Activation("relu")(x)
    
    x = tf.keras.layers.MaxPooling1D(2,padding="same")(x)    
    
    x = tf.keras.layers.Conv1D(64,5,padding="same")(x)
    x = tf.keras.layers.BatchNormalization()(x)
    x = tf.keras.layers.Activation("relu")(x)
    
    x = tf.keras.layers.MaxPooling1D(2,padding="same")(x)  
    
#     Upsample phase    
    
    x = tf.keras.layers.Conv1DTranspose(128,5,padding="same")(x)
    x = tf.keras.layers.BatchNormalization()(x)
    x = tf.keras.layers.Activation("relu")(x)
    
    x = tf.keras.layers.UpSampling1D(2)(x)  
    
    x = tf.keras.layers.Conv1DTranspose(256,7,padding="same")(x)
    x = tf.keras.layers.BatchNormalization()(x)
    x = tf.keras.layers.Activation("relu")(x)
    
    x = tf.keras.layers.UpSampling1D(2)(x)     
    
    outputs = tf.keras.layers.Conv1D(n_classes,1,activation="softmax",padding="same")(x)
    
    model = tf.keras.Model(inputs,outputs)
    return model

With an input shape of (128,1) and n_classes = 5.

The model works quite well for a baseline, but it has an interesting quirk which I'm trying to get my head around: when it makes predictions over characters, it always classifies an even number of characters (or "pixels" if thinking about this as analogous to an image segmentation task). So in the above example it would identify !dog or dog. as belonging to class 1, and 7apple or applef as belonging to class 2.

This is only a problem if the word contains an odd number of characters, which makes me think that it's probably something to do with max-pooling and upsampling operations. I've tried to find an answer by understanding how these operations work in Keras, but this hasn't been fruitful. So if anybody could shed some light onto why the predictions are always an even number of characters, and how I might rectify that, I would be very grateful!

EDIT from advice in the comments:

To clarify, the arrays are encoded simply using the ord function, and then min/max normalized to the range 0:1.

I'm using sparse categorical cross-entropy for the loss function, and the training setup is as follows:

loss = tf.keras.losses.SparseCategoricalCrossentropy()
opt = tf.keras.optimizers.Adam()

model.compile(optimizer=opt,loss=loss,metrics=["accuracy"])

callbacks = [tf.keras.callbacks.ModelCheckpoint("trial.h5",save_best_only=True)]

epochs = 10
model.fit(train_gen, epochs=epochs, validation_data=test_gen, callbacks=callbacks)

Where train_gen and test_gen are data generators built as a tf.keras.utils.Sequence subclass.

Solution

I think when you use UpSampling1D each value is repeated twice. Which means the input to the last step contains pair-wise duplicated value. It would then give the same predicted class for adjancent characters. If my guess is correct, you would always see the same prediction for the 2k and 2k+1 characters.

You could confirm by inspecting the input x in

outputs = tf.keras.layers.Conv1D(n_classes,1,activation="softmax",padding="same")(x)

It should look like [a, a, b, b, c, c, ...]

To solve the issue you probably can add an additional step between outputs = ... and x = tf.keras.layers.UpSampling1D(2)(x)