Search code examples
keraskeras-layer

Num of parameters after keras.layers.Bidirectional is not doubled?


Below are the code and outcomes. There are 2 models: one with Bidirectional. My questions is why # of parameters (264) time_distributed_14 (TimeDis is not doubled of time_distributed_13 (TimeDis (136)? I know 264 = 136 * 2 - 8. why do we need to -8 here?

from keras.models import Sequential
from keras.layers import Dense, Activation, TimeDistributed, Bidirectional
from keras.layers.recurrent import GRU
import numpy as np

InputSize = 15
MaxLen = 64
HiddenSize = 16

OutputSize = 8
n_samples = 1000

model1 = Sequential()
model1.add(GRU(HiddenSize, return_sequences=True, input_shape=(MaxLen, InputSize)))
model1.add(TimeDistributed(Dense(OutputSize)))
model1.add(Activation('softmax'))
model1.compile(loss='categorical_crossentropy', optimizer='rmsprop')


model2 = Sequential()
model2.add(Bidirectional(GRU(HiddenSize, return_sequences=True), input_shape=(MaxLen, InputSize)))
model2.add(TimeDistributed(Dense(OutputSize)))
model2.add(Activation('softmax'))
model2.compile(loss='categorical_crossentropy', optimizer='rmsprop')


print(model1.summary())
print(model2.summary())

Outcome:

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
gru_9 (GRU)                  (None, 64, 16)            1536      
_________________________________________________________________
time_distributed_13 (TimeDis (None, 64, 8)             136       
_________________________________________________________________
activation_6 (Activation)    (None, 64, 8)             0         
=================================================================
Total params: 1,672
Trainable params: 1,672
Non-trainable params: 0
_________________________________________________________________
None
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
bidirectional_7 (Bidirection (None, 64, 32)            3072      
_________________________________________________________________
time_distributed_14 (TimeDis (None, 64, 8)             264       
_________________________________________________________________
activation_7 (Activation)    (None, 64, 8)             0         
=================================================================
Total params: 3,336
Trainable params: 3,336
Non-trainable params: 0
_________________________________________________________________
None

Solution

  • There aren't only "weights" there are "biases" too, and biases completely ignore the inputs .

    weights = input * output 
           - regular: = 16*8 = 128
           - bidirec: = 32*8 = 256
    
    biases = output
           - regular: = 8
           - bidirec: = 8
    
    parameters = weights + biases
           - regular: = 128 + 8 = 136
           - bidirec: = 256 + 8 = 264