Search code examples
pythondeep-learninglstmcntk

Non Reproducible results even after setting the seed value (Python API of Microsoft CNTK)


Previously, I mentioned that there aren't any options available to define a seed value in Branscripts for CNTK sequential machine learning models[1]. Hence I migrated my code to Python API (CNTK), which gives more fine-grained options when defining the seed values of sequential machine learning models. Below are the instances that I have used random initialization in my implementation (and set the corresponding seed value as well)

// CNTK imports

import numpy as np
import pandas as pd
import random
import math as m

from cntk.device import *
from cntk import Trainer
from cntk.layers import * 

import cntk
import cntk.ops as o
import cntk.layers as l

//defining the random seed

np.random.seed(8888)
random.seed(8888)

// Defining input and output training vectors

input_array_df = np.asarray(input_split_df[1:len(input_split_df)], dtype=np.float32)
output_array_df = np.asarray(output_df_df[1:len(output_df_df)], dtype=np.float32)
tup=(input_array_df, output_array_df)
listOfTuplesOfInputsLabels.append(tup)

//shuffling the input vector

 random.shuffle(listOfTuplesOfInputsLabels) 

//Defining sequential model

num_minibatches = len(features) // minibatch_size
    epoch_size = len(features)*1

    feature = o.input_variable((input_dim),np.float32)
    label = o.input_variable((output_dim),np.float32)

    netout=Sequential([For(range(1), lambda i: Recurrence(LSTM(lstm_cell_dimension,use_peepholes=LSTM_USE_PEEPHOLES,init=glorot_uniform(seed=8888)))),Dense(output_dim,bias=BIAS,init=glorot_uniform(seed=8888))])(feature)

    learner = momentum_sgd(netout.parameters, lr = learning_rate_schedule([(4,0.003),(16,0.002)], unit=UnitType.sample,epoch_size=epoch_size),
                               momentum=momentum_as_time_constant_schedule(minibatch_size / -m.log(0.9)), gaussian_noise_injection_std_dev = gaussian_noise,l2_regularization_weight =l2_regularization_weight)

//Splitting into mini batches

tf = np.array_split(features,num_minibatches)
tl = np.array_split(labels,num_minibatches)

//Train

features = np.ascontiguousarray(tf[i%num_minibatches])
labels = np.ascontiguousarray(tl[i%num_minibatches])
trainer.train_minibatch({feature : features, label : labels})

Unfortunately, even though I was able to successfully define the seed value in my code, I could still observe some smaller variations in my final result. Is this because of the floating point calculations? or could you find anything in my code that I should have set the seed value, which I haven't done it already?

Thanks !

[1] Defining a seed value in Branscripts for CNTK sequential machine learning models


Solution

  • Can you try the below:

    from _cntk_py import set_fixed_random_seed, force_deterministic_algorithms
    set_fixed_random_seed(1)
    force_deterministic_algorithms()