I am trying to make a basic nonlinear regression model that will predict the return index of companies in the FTSE350.
I am unsure as to what my bias term should look like in terms of dimensions and whether I am using it properly in the calculations method:
w1 = tf.Variable(tf.truncated_normal([4, 10], mean=0.0, stddev=1.0, dtype=tf.float64))
b1 = tf.Variable(tf.constant(0.1, shape=[4,10], dtype = tf.float64))
w2 = tf.Variable(tf.truncated_normal([10, 1], mean=0.0, stddev=1.0, dtype=tf.float64))
b2 = tf.Variable(tf.constant(0.1, shape=[1], dtype = tf.float64))
def calculations(x, y):
w1d = tf.matmul(x, w1)
h1 = (tf.nn.sigmoid(tf.add(w1d, b1)))
h1w2 = tf.matmul(h1, w2)
activation = tf.add(tf.nn.sigmoid(tf.matmul(h1, w2)), b2)
error = tf.reduce_sum(tf.pow(activation - y,2))/(len(x))
return [ activation, error ]
My initial thoughts were that it should be the same size as my weights but I get this error:
ValueError: Dimensions must be equal, but are 251 and 4 for 'Add' (op: 'Add') with input shapes: [251,10], [4,10]
I've played around with different ideas but don't seem to be getting anywhere.
(My input data has 4 features)
The network structure I have attempted is 4 neurons in the input layer, 10 in the hidden layer, and 1 in the output later but I feel like I may mixed up the dimensions in my weights layer too?
When you are constructing the layers for a feed-forward fully-connected neural network (like in your example), the shape of the biases should be equal to the number of nodes in the corresponding layer. So in your case, since your weight matrix has a shape of (4, 10)
, you have 10 nodes in that layer and you should be using:
b1 = tf.Variable(tf.constant(0.1, shape=[10], type = tf.float64))
The reason for this is when you do w1d = tf.matmul(x, w1)
, you are actually getting a matrix of shape (batch_size, 10)
(if batch_size
is the number of rows in your input matrix). This is because you are matrix multiplying a (batch_size, 4)
matrix by a (4, 10)
weight matrix. Then, you are adding a bias across each column of w1d
, which can be represented as a 10-dimensional vector, which you would get if you made the shape of b1
[10]
.
Without the non-linearity (sigmoid) afterward, this is called an affine transformation, which you can read more about here: https://en.wikipedia.org/wiki/Affine_transformation.
Another fantastic resource is the Stanford Deep Learning Tutorial, which has a good explanation of how these feed-forward models work here: http://ufldl.stanford.edu/tutorial/supervised/MultiLayerNeuralNetworks/.
Hope that helped!