I am trying to do a weighted sum of matrices in tensorflow. Unfortunately, my dimensions are not small and I have a problem with memory. Another option is that I doing something completely wrong
I have two tensors U with shape (B,F,M) and A with shape (C,B). I would like to do weighted sum and stacking.
For each index c from C, I have vector of weights a from A, with shape (B,). I want to use it for the weighted sum of U to get matrix U_t with shape (F, M). This is pretty same with this, where I found small help.
Unfortunately, I want to do this for each vector a in A to get C matrices U_tc in list. U_tc have mentioned shape (F,M). After that I concatenate all matrices in list to get super matrix with shape (C*F,M)
My values are C=2500, M=500, F=80, B=300
In the beginning, I tried the very naive approach with many loop and element selection which generate very much operation. Now with help from this, I have following:
U = tf.Variable(tf.truncated_normal([B, F, M],stddev=1.0 ,dtype=tf.float32) #just for example
A = tf.Variable(tf.truncated_normal([C, B],stddev=1.0) ,dtype=tf.float32) #just for example
U_t = []
for ccc in xrange(C):
a = A[ccc,:]
a_broadcasted = tf.tile(tf.reshape(a,[B,1,1]), tf.stack([1,F,M]))
T_p.append(tf.reduce_sum(tf.multiply(U,a_broadcasted), axis=0))
U_tcs = tf.concat(U_t,axis=0)
Unfortunately, this is failing at memory error. I am not sure if I did something wrong, or it is because computation has a too much mathematic operation? Because I think... variables aren't too large for memory, right? At least, I had larger variables before and it was ok. (I have 16 GB GPU memory)
Am I doing that weighted sum correctly?
Any idea how to do it more effective?
I will appreciate any help. Thanks.
1. Weighted sum and Concatenation
You can use vector operations directly without loops when memory is not limited.
import tensorflow as tf
C,M,F,B=2500,500,80,300
U = tf.Variable(tf.truncated_normal([B, F, M],stddev=1.0 ,dtype=tf.float32)) #just for example
A = tf.Variable(tf.truncated_normal([C, B],stddev=1.0) ,dtype=tf.float32) #just for example
# shape=(C,B,1,1)
A_new = tf.expand_dims(tf.expand_dims(A,-1),-1)
# shape=(B,F,M)
U_t = tf.reduce_sum(tf.multiply(A_new , U),axis=1)
# shape=(C*F,M)
U_tcs = tf.reshape(U_t,(C*F,M))
2. Memory error
In fact, I also had memory errors when I ran the above code.
ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[2500,300,80,500]...
With a little modification of the above code, it works properly on my 8GB GPU memory.
import tensorflow as tf
C,M,F,B=2500,500,80,300
U = tf.Variable(tf.truncated_normal([B, F, M],stddev=1.0 ,dtype=tf.float32)) #just for example
A = tf.Variable(tf.truncated_normal([C, B],stddev=1.0) ,dtype=tf.float32) #just for example
# shape=(C,B,1,1)
A_new = tf.expand_dims(tf.expand_dims(A,-1),-1)
U_t = []
for ccc in range(C):
a = A_new[ccc,:]
a_broadcasted = tf.reduce_sum(tf.multiply(a, U),axis=0)
U_t.append(a_broadcasted)
U_tcs = tf.concat(U_t,axis=0)