optimization machine-learning neural-network deep-learning hyperparameters

Optimize hyperparameters for deep network

I am currently trying to come up with a novel structure for a CLDNN (Convolutional, LSTM, Deep Neural Network)

Just as any other networks, I am having a difficult time optimizing the hyper parameters.

I would like to try grid search and random search to get an optimal set of hyperparameters but I am not clear on few things.

If I run a simulation of the network with a temporary set of hyperparameters, how do I measure "goodness" of the hyperparameters? I was thinking about recording the cost and training accuracy after N number of epochs for each simulations.
Since each simulation takes relatively long time (for my network it takes about 70 seconds to train for one epoch), is there a faster way to check the "goodness" of the hyperparameters without actually running the full training?
Is there a general tip/advice for hyperparameter-optimization?

Solution

So basically - in order to measure performance across different hyperparameters - the best practice is to to simulate the process of training your final classifier on a training data for each parameters setup - and then compare different results with respect to measures which you want to hyperoptimize.
If you change the process of training (by e.g. setting a fixed rate of epochs during a hyperoptimization phase and then setting different in a final training) - you shouldn't expect that result obtained during multiple testing phases would generalize. In my opinion this might harm your optimization process especially that some hyperparameters setups need more time to actually obtain good results (e.g. when you set a really high dropout rate) and cutting training time during choosing the best value might tend to make hyperparameters setups that give better result at earlier training stage more favourable.
Good practices?:
- choose random search, not grid search. Usually your training network is less sensitive with respect to some parameters, so making a full grid search is a lost of time,
- if you want to try more sophisticated methods you could try more complexed methods e.g. bayessian hyperoptimization,
- use crossvalidation or run your network with a given hyperparameter setup multiple times. It's due to the fact that neural networks might be sensitive to starting weights - so the score data might not generalize well,
- parallelize your training process. Try to run training process e.g. on different machines and then simply merge the results.