Search code examples
tensorflowkerasconv-neural-networkhuggingface-transformerstransformer-model

Problem connecting transformer output to CNN input in Keras


I need to build a transformer-based architecture in Tensorflow following the encoder-decoder approach where the encoder is a preexisting Huggingface Distilbert model and the decoder is a CNN.

Inputs: a text containing texts with several phrases in a row. Outputs: codes according to taxonomic criteria. My data file has 7387 pairs text-label in TSV format:

text \t code
This is example text number one. It might contain some other phrases. \t C21
This is example text number two. It might contain some other phrases. \t J45.1
This is example text number three. It might contain some other phrases. \t A27

The remainder of the code is this:

        text_file = "data/datafile.tsv"
        with open(text_file) as f:
                lines = f.read().split("\n")[:-1]
                text_and_code_pairs = []
                for line in lines:
                        text, code = line.split("\t")
                        text_and_code_pairs.append((text, code))


        random.shuffle(text_and_code_pairs)
        num_val_samples = int(0.10 * len(text_and_code_pairs))
        num_train_samples = len(text_and_code_pairs) - 3 * num_val_samples
        train_pairs = text_and_code_pairs[:num_train_samples]
        val_pairs = text_and_code_pairs[num_train_samples : num_train_samples + num_val_samples]
        test_pairs = text_and_code_pairs[num_train_samples + num_val_samples :]

        train_texts = [fst for (fst,snd) in train_pairs]
        train_labels = [snd for (fst,snd) in train_pairs]
        val_texts = [fst for (fst,snd) in val_pairs]
        val_labels = [snd for (fst,snd) in val_pairs]
        test_texts = [fst for (fst,snd) in test_pairs]
        test_labels = [snd for (fst,snd) in test_pairs]

        distilbert_encoder = TFDistilBertModel.from_pretrained("distilbert-base-multilingual-cased")
        tokenizer = DistilBertTokenizerFast.from_pretrained("distilbert-base-multilingual-cased")

        train_encodings = tokenizer(train_texts, truncation=True, padding=True)
        val_encodings = tokenizer(val_texts, truncation=True, padding=True)
        test_encodings = tokenizer(test_texts, truncation=True, padding=True)

        train_dataset = tf.data.Dataset.from_tensor_slices((
                dict(train_encodings),
                train_labels
        ))
        val_dataset = tf.data.Dataset.from_tensor_slices((
                dict(val_encodings),
                val_labels
        ))
        test_dataset = tf.data.Dataset.from_tensor_slices((
                dict(test_encodings),
                test_labels
        ))

        model = build_model(distilbert_encoder)
        model.fit(train_dataset.batch(64), validation_data=val_dataset, epochs=3, batch_size=64)
        model.predict(test_dataset, verbose=1)

Lastly, the build_model function:

def build_model(transformer, max_len=512):
        model = tf.keras.models.Sequential()
        # Encoder
        inputs = layers.Input(shape=(max_len,), dtype=tf.int32)
        distilbert = transformer(inputs)
        # LAYER - something missing here?
        # Decoder
        conv1D = tf.keras.layers.Conv1D(filters=5, kernel_size=10)(distilbert)
        pooling = tf.keras.layers.MaxPooling1D(pool_size=2)(conv1D)
        flat = tf.keras.layers.Flatten()(pooling)
        fc = tf.keras.layers.Dense(1255, activation='relu')(flat)
        softmax = tf.keras.layers.Dense(1255, activation='softmax')(fc)
        model = tf.keras.models.Model(inputs = inputs, outputs = softmax)
        model.compile(tf.keras.optimizers.Adam(learning_rate=5e-5), loss="categorical_crossentropy", metrics=['accuracy'])
        print(model.summary())
        return model

I managed to narrow down the possible locations of my problem. After changing from sequential to functional Keras API, I get the following error:

Traceback (most recent call last):
  File "keras_transformer.py", line 99, in <module>
    main()
  File "keras_transformer.py", line 94, in main
    model = build_model(distilbert_encoder)
  File "keras_transformer.py", line 23, in build_model
    conv1D = tf.keras.layers.Conv1D(filters=5, kernel_size=10)(distilbert)
  File "/home/users/user/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 897, in __call__
    self._maybe_build(inputs)
  File "/home/users/user/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 2416, in _maybe_build
    self.build(input_shapes)  # pylint:disable=not-callable
  File "/home/users/user/.local/lib/python3.6/site-packages/tensorflow/python/keras/layers/convolutional.py", line 152, in build
    input_shape = tensor_shape.TensorShape(input_shape)
  File "/home/users/user/.local/lib/python3.6/site-packages/tensorflow/python/framework/tensor_shape.py", line 771, in __init__
    self._dims = [as_dimension(d) for d in dims_iter]
  File "/home/users/user/.local/lib/python3.6/site-packages/tensorflow/python/framework/tensor_shape.py", line 771, in <listcomp>
    self._dims = [as_dimension(d) for d in dims_iter]
  File "/home/users/user/.local/lib/python3.6/site-packages/tensorflow/python/framework/tensor_shape.py", line 716, in as_dimension
    return Dimension(value)
  File "/home/users/user/.local/lib/python3.6/site-packages/tensorflow/python/framework/tensor_shape.py", line 200, in __init__
    None)
  File "<string>", line 3, in raise_from
TypeError: Dimension value must be integer or None or have an __index__ method, got 'last_hidden_state'

It seems that the error lies in the connection between the output of the transformer and the input of the convolutional layer. Am I supposed to include another layer between them so as to adapt the output of the transformer? If so, what would be the best option?I'm using tensorflow==2.2.0, transformers==4.5.1 and Python 3.6.9


Solution

  • I think the problem is to call the right tensor for the tensorflow layer after the dilbert instance. Because distilbert = transformer(inputs) returns an instance rather than a tensor like in tensorflow, e.g., pooling = tf.keras.layers.MaxPooling1D(pool_size=2)(conv1D). pooling is the output tensor of the MaxPooling1D layer.

    I fix your problem by calling the last_hidden_state variable of the distilbert instance (i.e. output of the dilbert model), and this will be your input to the next Conv1D layer.

    import os
    os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' # suppress Tensorflow messages
    
    from transformers import TFDistilBertModel, DistilBertModel
    import tensorflow as tf
    
    distilbert_encoder = TFDistilBertModel.from_pretrained("distilbert-base-multilingual-cased")
    
    
    def build_model(transformer, max_len=512):
            # model = tf.keras.models.Sequential()
            # Encoder
            inputs = tf.keras.layers.Input(shape=(max_len,), dtype=tf.int32)
            distilbert = transformer(inputs)
            # Decoder
            ###### !!!!!! #########
            conv1D = tf.keras.layers.Conv1D(filters=5, kernel_size=10)(distilbert.last_hidden_state) 
            ###### !!!!!! #########        
            pooling = tf.keras.layers.MaxPooling1D(pool_size=2)(conv1D)
            flat = tf.keras.layers.Flatten()(pooling)
            fc = tf.keras.layers.Dense(1255, activation='relu')(flat)
            softmax = tf.keras.layers.Dense(1255, activation='softmax')(fc)
            model = tf.keras.models.Model(inputs = inputs, outputs = softmax)
            model.compile(tf.keras.optimizers.Adam(learning_rate=5e-5), loss="categorical_crossentropy", metrics=['accuracy'])
            print(model.summary())
            return model
    
    
    model = build_model(distilbert_encoder)
    

    This returns,

    Layer (type)                 Output Shape              Param #   
    =================================================================
    input_1 (InputLayer)         [(None, 512)]             0         
    _________________________________________________________________
    tf_distil_bert_model (TFDist TFBaseModelOutput(last_hi 134734080 
    _________________________________________________________________
    conv1d (Conv1D)              (None, 503, 5)            38405     
    _________________________________________________________________
    max_pooling1d (MaxPooling1D) (None, 251, 5)            0         
    _________________________________________________________________
    flatten (Flatten)            (None, 1255)              0         
    _________________________________________________________________
    dense (Dense)                (None, 1255)              1576280   
    _________________________________________________________________
    dense_1 (Dense)              (None, 1255)              1576280   
    =================================================================
    Total params: 137,925,045
    Trainable params: 137,925,045
    Non-trainable params: 0
    

    Note: I assume you mean tf.keras.layers.Input by layers.Input in your build_model function.