I have two tensors of the following shapes:
tensor1 => shape(?, ?, 100) # corresponds to [batch_size, max_time, embedding_size]
tensor2 => shape(?, 100) # corresponds to [batch_size, embedding_size]
What I wish to do is for every [100] dimensional
vector in tensor2
obtain a matrix multiplication with corresponding [max_time, 100] dimensional
matrix in tensor1
to get batch_size
number of max_time
dimensional vectors; which is same as a [batch_size, max_time] dimensional
matrix.
For those who know: I am basically trying to implement the content based attention over the encoded inputs given by the encoder of a seq2seq model. all the [max_time]
dimensional vectors are just the attention values that I later softmax.
I am aware that tensorflow provides the AttentionWrapper
as well as various helpers in the contrib
package. However, I wish to do this because I am experimenting with the attention mechanism to obtain a hybrid attention mask.
I have tried the tf.while_loop
but, got stuck in the ?
shape to unroll the loop. A vectorized implementation also doesn't seem very straight forward to me. Please help.
What you can do is use tf.matmul
and handle your vectors like 100 * 1 matrices.
tensor2 = tf.expand_dims(tensor2, 2)
result = tf.matmul(tensor1, tensor2)