Neural networks and big-o

I had to complete the below homework. I lost all the points for this question because I didn't even know where to start. Frankly, we were taught none of this. I received no feedback for this homework, but I really want to learn how to figure these out. Can someone let me know where to start? I am extremely frustrated that I'm not getting help from my professor. All I want to do is learn something. I have googled the heck out of this and haven't been able to find anything that helped.

Suppose that I have a training set of m data points, with n features. Suppose further that I use a neural network with h hidden nodes to learn this data, and that it takes e epochs before backprop terminates. Answer each of the following questions with big-O notation. (e.g. O(m^2n^5/h).

a. How much memory is needed for storing this neural network, assuming each feature in the dataset corresponds to an input node? b. How much time is needed to train such a neural network classifier? c. How much time is needed to determine the class of a test point after training is complete?

Solution

Let me provide some basic guidance so to get you started. You should be able to complete the homework.

Let's assume we have a neural network with n inputs, a hidden layer of h neurons and m outputs. In total we have these many neurons:

nbNeurons = h + m.

Now, the amount of memory a neuron occupies is O(w), where w is the number of inputs the neuron receives. The reason is that a neuron has one weight per input plus some additional information such as bias, learning rate, output, error. Since all of these are bounded by a constant, the memory required by one neuron is proportional to the number of weights, i.e., inputs.

The hidden layer has h neurons with n inputs each, so it needs h*O(n) units of memory. The output layer has m neurons with h inputs each, hence m*O(h). Thus

memory = h*O(n) + m*O(h) = O(h*n) + O(m*h) = O((n+m)*h)

Feeding a neuron with k inputs requires k multiplications plus k+1 additions + 1 evaluation of the sigmoid (or similar) function. This is clearly proportional to k and therefore O(k).

Feeding our network takes feeding h hidden neurons with n inputs each, hence h*O(n) plus feeding m output neurons with h inputs each, m*O(h). So,

feed time = h*O(n) + m*O(h) = O((n+m)*h)

Now, go ahead and keep the same reasoning to count the number of operations required for propagating the errors back when training the network with one sample of your training data. You will feed the network with the sample, and then adjust the weights and the bias of every neuron. Finally, multiply this quantity by the number of epochs (as you will repeat the same set of operations that many times).