I was reading a paper and the authors described their network as follows:
"To train the corresponding deep network, a fully connected network with one hidden layer is used. The network has nine binary input nodes. The hidden layer contains one sigmoid node, and in the output layer there is one inner product function. Thus, the network has 10 variables."
The network is used to predict a continuous number (y). My problem is, I do not understand the structure of the network after the sigmoid node. What does the output layer do? What is the inner product used for?
Usually, the pre-activation functions per neuron are a combination of an inner product (or dot product in vector-vector multiplication) and one addition to introduce a bias. A single neuron can be described as
z = b + w1*x1 + x2*x2 + ... + xn*xn
= b + w'*x
h = activation(z)
where b
is an additive term (the neuron's bias) and each h
is the output of one layer and corresponds to the input of the following layer. In the case of the "output layer", it is that y = h
. A layer might also consist of multiple neurons or - like in your example - only of single neurons.
In the described case, it seems like no bias is used. I understand it as follows:
For each input neuron x1
to x9
, a single weight is used, nothing fancy here. Since there are nine inputs, this makes 9 weights, resulting in something like:
hidden_out = sigmoid(w1*x1 + w2*x2 + ... + w9*x9)
In order to connect the hidden layer to the output, the same rule applies: The output layer's input is weighted and then summed over all inputs. Since there is only one input, only one weight is to be "summed", such that
output = w10*hidden_out
Keep in mind that the sigmoid function squashes its input onto an output range of 0..1, so multiplying it with a weight re-scales it to your required output range.