tensorflow neural-network keras artificial-intelligence

Merge weights of same model trained on 2 different computers using tensorflow

I was doing some research on training deep neural networks using tensorflow. I know how to train a model. My problem is i have to train the same model on 2 different computers with different datasets. Then save the model weights. Later i have to merge the 2 model weight files somehow. I have no idea how to merge them. Is there a function that does this or should the weights be averaged?

Any help on this problem would be useful

Thanks in advance

Solution

It is better to merge weight updates (gradients) during the training and keep a common set of weights rather than trying to merge the weights after individual trainings have completed. Both individually trained networks may find a different optimum and e.g. averaging the weights may give a network which performs worse on both datasets.

There are two things you can do:

Look at 'data parallel training': distributing forward and backward passes of the training process over multiple compute nodes each of which has a subset of the entire data.

In this case typically:

each node propagates a minibatch forward through the network
each node propagates the loss gradient backwards through the network
a 'master node' collects gradients from minibatches on all nodes and updates the weights correspondingly
and distributes the weight updates back to the compute nodes to make sure each of them has the same set of weights

(there are variants of the above to avoid that compute nodes idle too long waiting for results from others). The above assumes that Tensorflow processes running on the compute nodes can communicate with each other during the training.

Look at https://www.tensorflow.org/deploy/distributed) for more details and an example of how to train networks over multiple nodes.

If you really have train the networks separately, look at ensembling, see e.g. this page: https://mlwave.com/kaggle-ensembling-guide/ . In a nutshell, you would train individual networks on their own machines and then e.g. use an average or maximum over the outputs of both networks as a combined classifier / predictor.