Search code examples
pythontensorflow2.0tf.keraskeras-layer

Understanding tf.keras.layers.Dense()


I am trying to understand why there is a difference between calculating a dense layer operation directly and using the keras implementation.

Following the documentation (https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense) tf.keras.layers.Dense() should implement the operation output = activation(dot(input, kernel) + bias) but result and result1 below are not the same.

tf.random.set_seed(1)

bias = tf.Variable(tf.random.uniform(shape=(5,1)), dtype=tf.float32)
kernel = tf.Variable(tf.random.uniform(shape=(5,10)), dtype=tf.float32)
x = tf.constant(tf.random.uniform(shape=(10,1), dtype=tf.float32))

result = tf.nn.relu(tf.linalg.matmul(a=kernel, b=x) + bias)
tf.print(result)

test = tf.keras.layers.Dense(units = 5, 
                            activation = 'relu',
                            use_bias = True, 
                            kernel_initializer = tf.keras.initializers.Constant(value=kernel), 
                            bias_initializer = tf.keras.initializers.Constant(value=bias), 
                            dtype=tf.float32)

result1 = test(tf.transpose(x))

print()
tf.print(result1)

output


[[2.87080455]
 [3.25458574]
 [3.28776264]
 [3.14319134]
 [2.04760242]]

[[2.38769 3.63470697 2.62423944 3.31286287 2.91121125]]

Using test.get_weights() I can see that the kernel and bias (b) are getting set to the correct values. I am using TF version 2.12.0.


Solution

  • After some experimentation I realized that the kernel for the dense layer needs to be of shape=(10,5) as apposed to (5,10) as in the code from the original question above. This is implicit because units=5 so a vector of size 10 needs to be passed (hence why input_shape=(10,) is commented out as a reminder). Below is the corrected code:

    tf.random.set_seed(1)
    
    bias   = tf.Variable(tf.random.uniform(shape=(5,1)), dtype=tf.float32)
    kernel = tf.Variable(tf.random.uniform(shape=(10,5)), dtype=tf.float32)
    x = tf.constant(tf.random.uniform(shape=(10,1), dtype=tf.float32))
    
    result = tf.nn.relu(tf.linalg.matmul(a=kernel, b=x, transpose_a=True) + bias)
    tf.print(result)
    
    test = tf.keras.layers.Dense(units = 5, 
                                # input_shape=(10,),
                                activation = 'relu',
                                use_bias = True, 
                                kernel_initializer = tf.keras.initializers.Constant(value=kernel), 
                                bias_initializer = tf.keras.initializers.Constant(value=bias), 
                                dtype=tf.float32)
    
    result1 = test(tf.transpose(x))
    
    print()
    tf.print(result1)
    
    
    [[2.38769]
     [3.63470697]
     [2.62423944]
     [3.31286287]
     [2.91121125]]
    
    [[2.38769 3.63470697 2.62423944 3.31286287 2.91121125]]
    

    Ultimately, I am not entirely sure what was happening under the hood and why keras did not raise an error. I will check with the tf.keras.layers.Dense() implementation but any thoughts or suggestions by someone who knows the code already are highly appreciated!