python-2.7 tensorflow matrix matrix-multiplication weighted

How to do weighted sum as tensor operation in tensorflow?

I am trying to do a weighted sum of matrices in tensorflow. Unfortunately, my dimensions are not small and I have a problem with memory. Another option is that I doing something completely wrong

I have two tensors U with shape (B,F,M) and A with shape (C,B). I would like to do weighted sum and stacking.

Weighted sum

For each index c from C, I have vector of weights a from A, with shape (B,). I want to use it for the weighted sum of U to get matrix U_t with shape (F, M). This is pretty same with this, where I found small help.

Concatenation

Unfortunately, I want to do this for each vector a in A to get C matrices U_tc in list. U_tc have mentioned shape (F,M). After that I concatenate all matrices in list to get super matrix with shape (C*F,M)

My values are C=2500, M=500, F=80, B=300

In the beginning, I tried the very naive approach with many loop and element selection which generate very much operation. Now with help from this, I have following:

U = tf.Variable(tf.truncated_normal([B, F, M],stddev=1.0 ,dtype=tf.float32) #just for example
A = tf.Variable(tf.truncated_normal([C, B],stddev=1.0) ,dtype=tf.float32) #just for example

U_t = []

for ccc in xrange(C):
    a = A[ccc,:]
    a_broadcasted = tf.tile(tf.reshape(a,[B,1,1]), tf.stack([1,F,M]))
    T_p.append(tf.reduce_sum(tf.multiply(U,a_broadcasted), axis=0))


U_tcs = tf.concat(U_t,axis=0)

Unfortunately, this is failing at memory error. I am not sure if I did something wrong, or it is because computation has a too much mathematic operation? Because I think... variables aren't too large for memory, right? At least, I had larger variables before and it was ok. (I have 16 GB GPU memory)

Am I doing that weighted sum correctly?

Any idea how to do it more effective?

I will appreciate any help. Thanks.

Solution

1. Weighted sum and Concatenation

You can use vector operations directly without loops when memory is not limited.

import tensorflow as tf

C,M,F,B=2500,500,80,300
U = tf.Variable(tf.truncated_normal([B, F, M],stddev=1.0 ,dtype=tf.float32)) #just for example
A = tf.Variable(tf.truncated_normal([C, B],stddev=1.0) ,dtype=tf.float32) #just for example

# shape=(C,B,1,1)
A_new = tf.expand_dims(tf.expand_dims(A,-1),-1)
# shape=(B,F,M)
U_t = tf.reduce_sum(tf.multiply(A_new , U),axis=1)

# shape=(C*F,M)
U_tcs = tf.reshape(U_t,(C*F,M))

2. Memory error

In fact, I also had memory errors when I ran the above code.

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[2500,300,80,500]...

With a little modification of the above code, it works properly on my 8GB GPU memory.

import tensorflow as tf

C,M,F,B=2500,500,80,300
U = tf.Variable(tf.truncated_normal([B, F, M],stddev=1.0 ,dtype=tf.float32)) #just for example
A = tf.Variable(tf.truncated_normal([C, B],stddev=1.0) ,dtype=tf.float32) #just for example

# shape=(C,B,1,1)
A_new = tf.expand_dims(tf.expand_dims(A,-1),-1)

U_t = []
for ccc in range(C):
    a = A_new[ccc,:]
    a_broadcasted = tf.reduce_sum(tf.multiply(a, U),axis=0)
    U_t.append(a_broadcasted)
U_tcs = tf.concat(U_t,axis=0)