Search code examples
tensorflowlstmtensorattention-model

Tensorflow sequential matrix multiplication


I have two tensors of the following shapes:

tensor1 => shape(?, ?, 100) # corresponds to [batch_size, max_time, embedding_size]
tensor2 => shape(?, 100) # corresponds to [batch_size, embedding_size]

What I wish to do is for every [100] dimensional vector in tensor2 obtain a matrix multiplication with corresponding [max_time, 100] dimensional matrix in tensor1 to get batch_size number of max_time dimensional vectors; which is same as a [batch_size, max_time] dimensional matrix.

For those who know: I am basically trying to implement the content based attention over the encoded inputs given by the encoder of a seq2seq model. all the [max_time] dimensional vectors are just the attention values that I later softmax.

I am aware that tensorflow provides the AttentionWrapper as well as various helpers in the contrib package. However, I wish to do this because I am experimenting with the attention mechanism to obtain a hybrid attention mask.

I have tried the tf.while_loop but, got stuck in the ? shape to unroll the loop. A vectorized implementation also doesn't seem very straight forward to me. Please help.


Solution

  • What you can do is use tf.matmul and handle your vectors like 100 * 1 matrices.

    tensor2 = tf.expand_dims(tensor2, 2)
    result = tf.matmul(tensor1, tensor2)