Search code examples
pythontensorflowcoding-style

Efficient multi-feature similarity with each feature in tensorlflow


I want to calculate multi-feature similarity with each feature in tensorflow. But I don't know how to write it efficiently. Here is my sample code:

import numpy as np
import tensorflow as tf


num_data = 64
feat_dim = 6
A_feature = np.random.randn(10, feat_dim).astype(np.float32)
P_feature = np.random.randn(5, feat_dim).astype(np.float32)

#Python Version for each feature
out = np.zeros((len(P_feature),1))
for i in range(len(P_feature)):
    t = (A_feature-P_feature[i])
    t1 = t**2
    t2 = np.sum(t1,axis=1)
    t3 = np.sum(t2**2.0)**(1/2.0)
    out[i]=t3

#Half Tensorflow Version with only one feature result
P_dist2 = tf.norm(tf.reduce_sum(tf.square(tf.subtract(A_feature, P_feature[0])), 1),ord=2)
with tf.Session() as sess:
        pos_dist2_np = sess.run(P_dist2)

Can anyone tell me how to write efficient coding style in tensorflow? Thanks!!


Solution

  • You are almost there. Expand dimensions and use broadcasting to perform the operation for each feature simultaneously:

    aux = tf.subtract(A_feature[None, :, :], P_feature[:, None, :])  # Shape=(5, 10, feat_dim)
    aux = tf.reduce_sum(tf.square(aux), -1)  # Shape=(5, 10)
    P_dist3 = tf.norm(aux, ord=2, axis=-1)  # Shape=(5,)
    
    with tf.Session() as sess:
        pos_dist3_np = sess.run(P_dist3)
    

    Note that this works both when A_feature and P_feature are NumPy arrays and TensorFlow tensors.