We are currently doing a little experiment with machine learning with Deeplearning4j.
We have voltage measurements in time series from different devices that I know that depends on each other. We manage to labeling huge amount of those data with one and zeroes.
Our problem is to figure out the use of layers for the model.
For us it seems that it is experience that it is used among people and examples seems to be random.
We currently using LSTM and RNN
But how can we clarify if there is better models?
We would like to see if the model can figure out some dependencies through predictions that we haven’t noticed.
The best way to go about this, is to start by looking at your data and what you want to get out of it. Then you should start out by setting up a base line. Use the simplest possible modelling technique you are familiar with just so you have anything at all.
In your case it looks like you have a label for each timestep. So, you might just use simple linear regression for each timestep separately to get a feel for what you would get if you don't incorporate any sequence information at all. Anything that works fast is a good candidate for this step.
Once you have that baseline, you can start looking at building a deeplearning model that outperforms this baseline.
For time series data, you have two options at the moment in DL4J, either you use a recurrent layer like LSTM, or you use convolutions over time.
If you want to have an output at each timestep, then a recurrent layer is probably better for you. The convolutional approach usually works best if you want to have just a single result after reading in the whole sequence.
For choosing how wide those layers should be, and how many layers you should use, you will have to experiment a bit.
The first thing that you want to achieve is to build a model that can over-fit on a subset of your data. So you start out, by passing in only a single batch of examples over and over again. If the model can't overfit on that, you make the layers wider. If the layers start getting too wide, you add another layer on top.
If you use the deeplearning4j-ui module, it will tell you how many parameters your model currently has. They should usually be less than the number of total examples you have, or you risk overfitting on your full data set.
As soon as you can train a model to overfit on a small subset of your data, you can start training it with all of your data.
At that point you can then start looking into finding better hyperparameters and seeing by how much you can beat your baseline.