I need to make changes to an existing deeplearning4j (DL4J) model that has already been trained. The network consists of an input layer, one Graves LSTM and one RNN Output Layer.
My question is: Is it possible to add one or more untrained neurons to the LSTM layer without having to rebuild the model from a new config (which would I assume require retraining it)? I'd like to do things like, add one or more neurons to an existing layer, or add an entire layer (untrained) to a trained model.
Are these possible? I couldn't find any references to this, but I've seen folks doing it in other languages/frameworks so I wonder if I can also do it in DL4J.
BTW I'm aware this is an unusual thing to be doing. Please ignore the fact it will mess up the trained network, I just need to know if I can do it and how to go about it. :)
Any pointers will help!
Thanks!
Eduardo
You would use the transfer learning api to do that. See our examples here. https://github.com/deeplearning4j/dl4j-examples/blob/master/dl4j-spark-examples/dl4j-spark/src/main/java/org/deeplearning4j/transferlearning/vgg16/TransferLearning.md
Docs below:
The DL4J transfer learning API enables users to:
Holding certain layers frozen on a network and training is effectively the same as training on a transformed version of the input, the transformed version being the intermediate outputs at the boundary of the frozen layers. This is the process of “feature extraction” from the input data and will be referred to as “featurizing” in this document.
The forward pass to “featurize” the input data on large, pertained networks can be time consuming. DL4J also provides a TransferLearningHelper class with the following capabilities.
When running multiple epochs users will save on computation time since the expensive forward pass on the frozen layers/vertices will only have to be conducted once.
This example will use VGG16 to classify images belonging to five categories of flowers. The dataset will automatically download from http://download.tensorflow.org/example_images/flower_photos.tgz
I. Importing VGG16
TrainedModelHelper modelImportHelper = new TrainedModelHelper(TrainedModels.VGG16);
ComputationGraph org.deeplearning4j.transferlearning.vgg16 = modelImportHelper.loadModel();
FineTuneConfiguration fineTuneConf = new FineTuneConfiguration.Builder()
.learningRate(5e-5)
.updater(Updater.NESTEROVS)
.seed(seed)
.build();
The final layer of VGG16 does a softmax regression on the 1000 classes in ImageNet. We modify the very last layer to give predictions for five classes keeping the other layers frozen.
ComputationGraph vgg16Transfer = new TransferLearning.GraphBuilder(org.deeplearning4j.transferlearning.vgg16)
.fineTuneConfiguration(fineTuneConf)
.setFeatureExtractor(“fc2”)
.removeVertexKeepConnections("predictions")
.addLayer(“predictions”,
new OutputLayer.Builder(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD)
.nIn(4096).nOut(numClasses)
.weightInit(WeightInit.Xavier)
.activation(Activation.SOFTMAX).build(), ”fc2")
.build();
After a mere thirty iterations, which in this case is exposure to 450 images, the model attains an accuracy > 75% on the test dataset. This is rather remarkable considering the complexity of training an image classifier from scratch.
Here we hold all but the last three dense layers frozen and attach new dense layers onto it. Note that the primary intent here is to demonstrate the use of the API, secondary to what might give better results.
ComputationGraph vgg16Transfer = new TransferLearning.GraphBuilder(org.deeplearning4j.transferlearning.vgg16)
.fineTuneConfiguration(fineTuneConf)
.setFeatureExtractor("block5_pool")
.nOutReplace("fc2",1024, WeightInit.XAVIER)
.removeVertexAndConnections("predictions")
.addLayer(“fc3",new DenseLayer.Builder()
.activation(Activation.RELU)
.nIn(1024).nOut(256).build(),"fc2")
.addLayer(“newpredictions”,new OutputLayer
.Builder(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD)
.activation(Activation.SOFTMAX)
.nIn(256).nOut(numClasses).build(),”fc3")
.setOutputs("newpredictions")
.build();
Say we have saved off our model from (B) and now want to allow “block_5” layers to train.
ComputationGraph vgg16FineTune = new TransferLearning.GraphBuilder(vgg16Transfer)
.fineTuneConfiguration(fineTuneConf)
.setFeatureExtractor(“block4_pool”)
.build();
We use the transfer learning helper API. Note this freezes the layers of the model passed in.
Here is how you obtain the featured version of the dataset at the specified layer “fc2”.
TransferLearningHelper transferLearningHelper =
new TransferLearningHelper(org.deeplearning4j.transferlearning.vgg16, “fc2”);
while(trainIter.hasNext()) {
DataSet currentFeaturized = transferLearningHelper.featurize(trainIter.next());
saveToDisk(currentFeaturized,trainDataSaved,true);
trainDataSaved++;
}
Here is how you can fit with a featured dataset. vgg16Transfer is a model setup in (A) of section III.
TransferLearningHelper transferLearningHelper =
new TransferLearningHelper(vgg16Transfer);
while (trainIter.hasNext()) {
transferLearningHelper.fitFeaturized(trainIter.next());
}
Keep in mind this is a second model that leaves the original one untouched. For large pertained network take into consideration memory requirements and adjust your JVM heap space accordingly.
Therefore the last layer (as seen when printing the summary) is a dense layer and not an output layer with a loss function. Therefore to modify nOut of an output layer we delete the layer vertex, keeping it’s connections and add back in a new output layer with the same name, a different nOut, the suitable loss function etc etc.
When changing nOut users can specify a weight initialization scheme or a distribution for the layer as well as a separate weight initialization scheme or distribution for the layers it fans out to.
In other words, a model with frozen layers when serialized and read back in will not have any frozen layers. To continue training holding specific layers constant the user is expected to go through the transfer learning helper or the transfer learning API. There are two ways to “freeze” layers in a dl4j model.
- On a copy: With the transfer learning API which will return a new model with the relevant frozen layers
- In place: With the transfer learning helper API which will apply the frozen layers to the given model.
For eg, if a learning rate is specified this learning rate will apply to all unfrozen/trainable layers in the model. However, newly added layers can override this learning rate by specifying their own learning rates in the layer builder.