Search code examples
tensorflowkeraskeras-layer

Keras Backend: Difference between random_normal and random_normal_variable


My neural network has a custom layer, which takes an input vector x, generates a normally distributed tensor A and returns both A (used in subsequent layers) and the product Ax. Assuming I want to reuse the value stored in A at the output of the custom layer, in a second different layer, is there any subtle aspect that I need to factor while determining which Keras backend function (K.backend.random_normal or K.backend.random_normal_variable) I should use in order to generate A?

a) The backend function random_normal returns a tensor storing a different value following each call (see code snippet below). To me, this suggests that random_normal acts as a generator of normally distributed values. Does this mean that one should not use random_normal to generate a normally distributed tensor if they want to hold its value following calls?

b) The backend function random_normal_variable appears safer (see code snippet below) as it retains value across calls.

Is my conceptual understanding correct? Or am I missing something basic? I am using Keras 2.1.2 and Tensorflow 1.4.0.

Experiment with random_normal (value changes across calls):

In [5]: A = K.random_normal(shape = (2,2), mean=0.0, stddev=0.5) 
In [6]: K.get_value(A)
Out[6]: array([[ 0.4459489 , -0.82019573],
   [-0.39853573, -0.33919844]], dtype=float32)
In [7]: K.get_value(A)
Out[7]: array([[-0.37467018,  0.42445764],
   [-0.573843  , -0.3468301 ]], dtype=float32)

Experiment with random_normal_variable (value holds across calls):

In [9]: B = K.random_normal_variable(shape=(2,2), mean=0., scale=0.5)
In [10]: K.get_value(B)
Out[10]: array([[ 0.07700552,  0.28008622],
   [-0.69484973, -1.32078779]], dtype=float32)
In [11]: K.get_value(B)
Out[11]: array([[ 0.07700552,  0.28008622],
   [-0.69484973, -1.32078779]], dtype=float32)

Solution

  • From my understanding, this is due to the fact that random_normal_variable returns an instantiated Variable while random_normal returns a Tensor.

    K.random_normal(shape=(2,2), mean=0.0, stddev=0.5) 
    <tf.Tensor 'random_normal:0' shape=(2, 2) dtype=float32>
    
    K.random_normal_variable(shape=(2,2), mean=0.0, scale=0.5)
    <tf.Variable 'Variable:0' shape=(2, 2) dtype=float32_ref> 
    

    As for why the values vary for the Tensor and not for the Variable, I think the answer to this thread sums it up well:

    Variable is basically a wrapper on Tensor that maintains state across multiple calls to run [...]

    The answer also mentions that the variable needs to be initialized to evaluate it, which is the case here as you noticed (since you did not initialize the variable to evaluate it). In fact, the returned variable is already initialized thanks to a call to tensorflow.random_normal_initializer within the random_normal_variable function. Hope this clarifies why your code has this behaviour.