Search code examples
pythontensorflowkerasnlpembedding

Tensorflow Error Concatenating Char and Word embedding


I want to concatenate Char Embeddings (generated using CNN) with my Word Embedding (using Glove vectors) but getting the error since the shape of Char Embeddings is different from Word Embeddings.

How can fix the error or concatenate these?

self.character_embedding_weights = tf.get_variable(
                "character_embedding_weights",
                shape=[dataset.alphabet_size, parameters['character_embedding_dimension']],
                initializer=initializer)
            embedded_characters = tf.nn.embedding_lookup(self.character_embedding_weights,
                                                         self.input_token_character_indices, name='embedded_characters')

            if self.verbose:
                print("embedded_characters: {0}".format(embedded_characters))
            utils_tf.variable_summaries(self.character_embedding_weights)
            s = tf.shape(embedded_characters)
            char_embeddings = tf.reshape(embedded_characters, shape=[-1,25,20])

            # Conv #1
            conv1 = tf.layers.conv1d(
                inputs=char_embeddings,
                filters=30,
                kernel_size=3,
                padding="valid",
                activation=tf.nn.relu)

            # Conv #2
            conv2 = tf.layers.conv1d(
                inputs=conv1,
                filters=30,
                kernel_size=3,
                padding="valid",
                activation=tf.nn.relu)
            pool2 = tf.layers.max_pooling1d(inputs=conv2, pool_size=2, strides=2)
            # # Dense Layer
            character_embed_output = tf.layers.dense(inputs=pool2, units=32, activation=tf.nn.relu)

Here, I'm concatenating the token and char embeddings.

            with tf.variable_scope("concatenate_token_and_character_vectors"):

                if self.verbose: 
                    print('embedded_tokens: {0}'.format(embedded_tokens))

                token_lstm_input = tf.concat([character_embed_output, embedded_tokens], 
                    axis=1, name='token_lstm_input')

Getting this error

ValueError: Shape must be rank 3 but is rank 2 for'concatenate_token_and_character_vectors/token_lstm_input' (op: 'ConcatV2') with input shapes:[?,10,32], [?,100], [].**

I'm working with this repo https://github.com/Franck-Dernoncourt/NeuroNER It is using LSTM for Char-Level-Embedding and I want to use CNN for this.

Link where it is using LSTM for Char-Level-Embedding and I have my code using CNN as mentioned above.

Comment if any other info or code required.


Solution

  • Finally, I was able to resolve the problem but Flattening the char embedding then it can be easily concatenated with Word embeddings. By adding this line, It worked.

    character_embed_output = tf.layers.Flatten()(character_embed_output)