Search code examples
variablescopytensorflowsharedskflow

How to manipulate weights of an skflow model after fit/partial_fit?


I am building more than one DNN model based on Tensorflow's skflow library. I partition my data into minibatches and use partial_fit for fitting. After every cycle of partial_fit, I would like to copy the weights of the first n-hidden layers of a TensorFlowDNNClassifier model to another TensorFlowDNNClassifier model. And then continue learning/copying using partial_fit. (The topology of the first n-hidden layers for both models are identical.)

I know how to retrieve weights from classifier1:

classifier.get_tensor_value('dnn/layer0/Linear/Matrix:0')

But I don't know how to copy their values to a classifier2!

The usecase:

I am trying to build an ensemble of M DNN models based on skflow's TensorFlowDNNClassifier/TensorFlowDNNRegressor. I would like these M models to share the first n layers among each other. That is, the same inputs, architecture, and values. I wanted to do this with minimal change to the original code of skflow. To do this, I thought of dividing my data into minibatches and training the models one minibatch at the time. During each step (using a minibatch), I apply partial_fit on one model and copy the weights of its first n-hidden layers to the next model in the ensemble. Then I partial_fit the second model using the same minibatch and then I copy the new values of the weights to the next model. I repeat this training/copying until I reach to the last model in the ensemble. After training the Mth model, I copy the weights of its first n-hidden layers to all the previous (M-1) models. I then repeat this with the next minibatch until the weights of all M models converge.

EDIT: Thanks to @Ismael and @ilblackdragon(via another forum) for their valuable input. Their suggested solutions work best upon model creation. I had to add extra functions to TensorFlowEstimator so that I can easily copy weights from one model to another as I train (multiple steps of training on minibatches). I added the following functions to the class TensorFlowEstimator (defined in the file estimators/base.py)

def extract_num_hidden_layers(self,graph_ops):
    nhl = 0
    are_there_more_layers = True
    while are_there_more_layers:
        are_there_more_layers = False 
        layerName = 'dnn/layer' + str(nhl) + '/Linear/Matrix'
        for op in graph_ops:
            if(op.name == layerName):
                nhl+=1
                are_there_more_layers = True
                break
    return nhl

def create_updaters(self):
    self.weight_updaters = []
    self.bias_updaters = []
    for h in range(0,self.num_hidden_layers):
        with tf.variable_scope('', reuse=True):
            wName = 'dnn/layer' + str(h) + '/Linear/Matrix'
            wUpOp = tf.assign(tf.get_variable(wName), self.nValues)
            self.weight_updaters.append(wUpOp)
            bName = 'dnn/layer' + str(h) + '/Linear/Bias'
            bUpOp = tf.assign(tf.get_variable(bName), self.nValues)
            self.bias_updaters.append(bUpOp)

def get_layer_weights(self, layer_num):
    layer_name = 'dnn/layer' + str(layer_num) + '/Linear/Matrix:0'
    return self.get_tensor_value(layer_name)

def get_layer_biases(self, layer_num):
    layer_name = 'dnn/layer' + str(layer_num) + '/Linear/Bias:0'
    return self.get_tensor_value(layer_name)

def get_layer_params(self, layer_num):
    return [self.get_layer_weights(layer_num), self.get_layer_biases(layer_num)]

def set_layer_weights(self, layer_num, weights_values):
    self._session.run(self.weight_updaters[layer_num], 
                                feed_dict = {self.nValues: weights_values})

def set_layer_biases(self, layer_num, biases_values):
    self._session.run(self.bias_updaters[layer_num], 
                                feed_dict = {self.nValues: biases_values})

def set_layer_params(self, layer_num, params_values):
    self.set_layer_weights(layer_num, params_values[0])
    self.set_layer_biases(layer_num, params_values[1])

I then added the following lines into the function _setup_training right after creating the model's graph using self.model_fn(self._inp, self._out)

        graph_ops = self._graph.get_operations()
        self.num_hidden_layers = self.extract_num_hidden_layers(graph_ops)
        self.nValues = tf.placeholder(tf.float32)

        #self.weight_updaters & self.bias_updaters
        self.create_updaters()

And here how to use the getter and setter functions:

iris = datasets.load_iris()
classifier = skflow.TensorFlowDNNClassifier(hidden_units=[10,5,4], n_classes=3,continue_training=True)
classifier.fit(iris.data, iris.target)
l1b = classifier.get_layer_biases(1)
l1b[3] = 2 # here I manually change the value for demo
classifier.set_layer_biases(1,l1b)

Solution

  • You should use TensorFlowEstimator, in which you can define your custom models, basically you can insert any TensorFlow code into a custom model.

    So if you know how to retrieve weights you can use tf.Variable and pass the weights to a new dnn as its initial value, since:

    tf.Variable could have a Tensor or Python object convertible to a Tensor as initial value

    . So I am thinking that the transfer of weights should look something like this:

    weights_i = classifier_i.get_tensor_value('dnn/layer0/Linear/Matrix:0')
    
    def my_model_i_plus_1(X, y):
        W = tf.Variable(weights_i)
        b = tf.Variable(tf.zeros([biases_size]))
    
        layer = tf.nn.relu(tf.matmul(X, W) + b)
    
        return skflow.models.logistic_regression(layer, y)
    
    
    classifier_i_plus_1 = skflow.TensorFlowEstimator(model_fn=my_model_i_plus_1,
                                        n_classes=3,
                                        optimizer="SGD")