Search code examples
kerasdeep-learningclassificationmultilabel-classificationtransformer-model

Multilabel classification of a sequence, how to do it?


I am quite new to the deep learning field especially Keras. Here I have a simple problem of classification and I don't know how to solve it. What I don't understand is how the general process of the classification, like converting the input data into tensors, the labels, etc.

Let's say we have three classes, 1, 2, 3.

There is a sequence of classes that need to be classified as one of those classes. The dataset is for example

  • Sequence 1, 1, 1, 2 is labeled 2
  • Sequence 2, 1, 3, 3 is labeled 1
  • Sequence 3, 1, 2, 1 is labeled 3

and so on.

This means the input dataset will be

[[1, 1, 1, 2],
 [2, 1, 3, 3],
 [3, 1, 2, 1]]

and the label will be

[[2],
 [1],
 [3]]

Now one thing that I do understand is to one-hot encode the class. Because we have three classes, every 1 will be converted into [1, 0, 0], 2 will be [0, 1, 0] and 3 will be [0, 0, 1]. Converting the example above will give a dataset of 3 x 4 x 3, and a label of 3 x 1 x 3.

Another thing that I understand is that the last layer should be a softmax layer. This way if a test data like (e.g. [1, 2, 3, 4]) comes out, it will be softmaxed and the probabilities of this sequence belonging to class 1 or 2 or 3 will be calculated.

Am I right? If so, can you give me an explanation/example of the process of classifying these sequences?

Thank you in advance.


Solution

  • Here are a few clarifications that you seem to be asking about.

    • This point was confusing so I deleted it.
    • If your input data has the shape (4), then your input tensor will have the shape (batch_size, 4).
    • Softmax is the correct activation for your prediction (last) layer given your desired output, because you have a classification problem with multiple classes. This will yield output of shape (batch_size, 3). These will be the probabilities of each potential classification, summing to one across all classes. For example, if the classification is class 0, then a single prediction might look something like [0.9714,0.01127,0.01733].
    • Batch size isn't hard-coded to the network, hence it is represented in model.summary() as None. E.g. the network's last-layer output shape can be written (None, 3).
    • Unless you have an applicable alternative, a softmax prediction layer requires a categorical_crossentropy loss function.
    • The architecture of a network remains up to you, but you'll at least need a way in and a way out. In Keras (as you've tagged), there are a few ways to do this. Here are some examples:

    Example with Keras Sequential

    model = Sequential()
    model.add(InputLayer(input_shape=(4,))) # sequence of length four
    model.add(Dense(3, activation='softmax')) # three possible classes
    

    Example with Keras Functional

    input_tensor = Input(shape=(4,))
    x = Dense(3, activation='softmax')(input_tensor)
    model = Model(input_tensor, x)
    

    Example including input tensor shape in first functional layer (either Sequential or Functional):

    model = Sequential()
    model.add(Dense(666, activation='relu', input_shape=(4,)))
    model.add(Dense(3, activation='softmax'))
    

    Hope that helps!