Search code examples
pythonkerasneural-networklossctc

CTC loss implementation in keras


I am trying to implement a CTC loss with keras for my simplified neural network:

  
def ctc_lambda_func(args):
    y_pred, y_train, input_length, label_length = args
 
    return K.ctc_batch_cost(y_train, y_pred, input_length, label_length)


x_train = x_train.reshape(x_train.shape[0],20, 10).astype('float32')

input_data = layers.Input(shape=(20,10,))
x=layers.Convolution1D(filters=256, kernel_size=3,  padding="same", strides=1, use_bias=False ,activation= 'relu')(input_data)
x=layers.BatchNormalization()(x)
x=layers.Dropout(0.2)(x)

x=layers.Bidirectional (LSTM(units=200 , return_sequences=True)) (x)
x=layers.BatchNormalization()(x)
x=layers.Dropout(0.2)(x)


y_pred=outputs = layers.Dense(5, activation='softmax')(x)
fun = Model(input_data, y_pred)
# fun.summary()

label_length=np.zeros((3800,1))
input_length=np.zeros((3800,1))

for i in range (3799):
    label_length[i,0]=4
    input_length[i,0]=5 
  
y_train = np.array(y_train)
x_train = np.array(x_train)
input_length = np.array(input_length)
label_length = np.array(label_length) 

  
loss_out = Lambda(ctc_lambda_func, output_shape=(1,), name='ctc')([y_pred, y_train, input_length, label_length])
model =keras.models.Model(inputs=[input_data, y_train, input_length, label_length], outputs=loss_out)
model.compile(loss={'ctc': lambda y_train, y_pred: y_pred}, optimizer = 'adam')
model.fit(x=[x_train, y_train, input_length, label_length],  epochs=10, batch_size=100)

We have y_true (or y_train) with (3800,4) dimension, because of that I put label_length=4 and input_length=5 (+1 for blank)

I face this error :

ValueError: Input tensors to a Model must come from `tf.keras.Input`. Received: [[0. 1. 0. 0.]
 [0. 1. 0. 0.]
 [0. 1. 0. 0.]
 ...
 [1. 0. 0. 0.]
 [1. 0. 0. 0.]
 [1. 0. 0. 0.]] (missing previous layer metadata).

y_true is like this:

 [[0. 1. 0. 0.]
 [0. 1. 0. 0.]
 ...
 [1. 0. 0. 0.]
 [1. 0. 0. 0.]
 [1. 0. 0. 0.]]

what is my problem?


Solution

  • You misunderstood the lengths. It is not the number of label classes, it is the actual length of the sequences. CTC can only be used in situations where the number of the target symbols is smaller than the number of input states. Technically, the number of inputs and outputs is the same, but some of the outputs are the blanks. (This typically happens in speech recognition where you have plenty of input signal windows and reletively few fonemes in the ouput.)

    Assuming you must have padded the inputs and output to have them in a batch:

    • input_length shoud contain for each item in the batch, how many inputs are actually valid, i.e., not padding;

    • label_length should contain how many non-blank labels should the model produce for each item in the batch.