Search code examples

CNTK: A loss function for sequence to sequence processing

I'm doing a sequence-to-sequence model for phonemes alignment. Specifically my train data look like paired sequences (phoneme - length), where phoneme is a one-hot vector, and length is a float. So I want to feed the model with a phoneme sequence and get a corresponding length sequence.

My network is generally built like these:

model = Sequential(
    EmbeddingLayer{embeddingSize} : 
    RecurrentLSTMLayerStack {lstmDims} :

The LinearLayer{1} should do a conversion from lstmDims to 1 if I get things right. So when I feed the model with a sequence of length N, I should get a resulting sequence of length N as well.

Now I want to set up a proper loss function, which I think should be an average difference between the elements of a known result sequence and the model output. Averaging should be done through the time axis, so that sequences of different lengths could be managed.

I was planning to do something like

objectives = Input(1) #actually a sequence here as stated in the reader
result = model(features)
errs = Abs(objectives - result)
loss_function = ReduceMean(errs)
criterionNodes  = (loss_function)

but in Reduction Operations it's explicitly stated that

These operations do not support reduction over sequences. Instead, you can achieve this with a recurrence.

I'm not sure how to use recurrence for my task. And I'm also not sure if the whole concept is fine.


  • You need two recurrences that are not too complicated (for the second one we use a "builtin" operation whose implementation is in the file):

    sum = errs + PastValue (0, sum, defaultHiddenActivation=0)
    count = BS.Loop.Count(errs)
    loss_function = sum / count