I am a programming enthusiast so please excuse me and help fill any gaps.. From what i understand good results from a neural network require the sigmoid and either learn rate or step rate (depending on training method) to be set correctly along with learning iterations.
While there is a lot of education about these values and the principal of generalization and avoiding an over fit, there doesn't seem to be much focus on their relationship with the data and network.
I've noticed that the number of samples, neurons and inputs seem to scale where these settings best land. (more or less inputs may change the iterations req for example).
Is there a mathematical way to find a good (approximate) starting point for sigmoid, learn rate, steps, iterations and the like based on known values such as samples, inputs, outputs, layers etc?
Before the deep learning explosion, one common way to determine the best number of parameters in your network was to use Bayesian regularization. Bayesian regularization is a method to avoid overfitting even if your network is larger than necessary.
Regarding the learning/step rate, the problem is that choosing a small step rate can make learning notoriously slow, while a large step rate may make your network diverge. Thus, a common technique was to use a learning method that could automatically adjust the learning rate in order to accelerate when necessary and decelerate in certain regions of the gradient.
As such, a common way to learn neural networks while taking care of both problems was to use the Levenberg-Marquardt learning algorithm with Bayesian Regularization. The Levenberg-Marquardt algorithm is an adaptive algorithm in the sense that it can adjust the learning rate after every iteration, being able to switch from Gauss-Newton updates (using second order information) back to a Gradient Descent algorithm (using only first order information) as needed.
It can also give you an estimate on the number of parameters that you really need in your network. The number of parameters is the total number of weights considering all neurons in the network. You can then use this parameter to estimate how many neurons you should be using in the first place.
This method is implemented by the MATLAB function trainbr. However, since you also included the accord-net tag, I should also say that it is implemented by the LevenbergMarquardtLearning class (you might want to use the latest alpha version in NuGet in case you are dealing with multiple output problems).