I am following the Tensorflow MNIST tutorial.
Reading through the theoretical / intuition section, I came to understand x
, the input, as being a column matrix.
In fact, when describing softmax
, x
is shown as a column matrix:
However, declared in tensorflow
, x looks like this:
x = tf.placeholder(tf.float32, [None, 784])
I read this a x
being an array of variable length ( None ) with each element of this array being a column matrix of size 784.
Even though x
is declared as an array of column matrices, it is used as if it was just a column matrix:
y = tf.nn.softmax(tf.matmul(x, W) + b)
In the example, W
and b
are declared intuitivly, as variables of shape [784, 10]
and [10]
respectivly, which makes sense.
My questions are:
Does Tensorflow automatically perform the softmax operation for each column matrix in x?
Am I correct in assuming [None, value] means, intuitivly, an array of variable size with each element being an array of size value? Or is it possible for [None, value] to also mean just an array of size value? ( without it being in a container array )
What is the correct way to link the theoretical description, where x is a column vector to the implementation, where x is an array of column matrices?
Thanks for your help!
The intuition is for a single input sample (and that's why you see a column vector). In practice however, training is done using mini-batches which consist of a number of input samples. (depending on the batch_size
).
x = tf.placeholder(tf.float32, [None, 784])
This line makes a matrix of dimensions ? x 784
where ?
will denote the batch size. the column vectors in a sense have become the rows of this new matrix.
Since we've converted our column vector into rows, we interchange the order of multiplication of x
and W
. This is why your W
has a dimension of 784 x 10
and b
has a dimension 10
which will apply on all elements.
After the first multiplication, x*W
has a dimension ? x 10
. The same element b
is added to every row of x*W
. So if my first row of x*W
is [1,2,3,4,5,6,7,8,9,0]
and b
is [1,1,1,1,1,1,1,1,1,1]
, the first row of the resultant will be [2,3,4,5,6,7,8,9,10,1]
. If you are finding it very hard to understand, try taking a transpose of W*x
.
Coming to your questions,
Does Tensorflow automatically perform the softmax operation for each column matrix in x?
Yes, in your context. TensorFlow applies the softmax
across all elements of dimension 1
(all the rows in my interpretation above). So your resulting softmax
result will also have dimension ? x 10
.
Am I correct in assuming [None, value] means, intuitivly, an array of variable size with each element being an array of size value? Or is it possible for [None, value] to also mean just an array of size value? ( without it being in a container array )
Yes, the former is the correct interpretation. Also look at my ?
matrix analogy above.
What is the correct way to link the theoretical description, where x is a column vector to the implementation, where x is an array of column matrices?
I personally interpret this like a transpose of W*x
. Elaborating, let x
be a number of column vectors, [x1 x2 x3 x4 x5 ...]
having dimension 784 x ?
where ?
is the batch size. Let W
have a dimension 10 x 784
. If you apply W
on each column, you will get [W*x1 W*x2 W*x3...]
or a number of column vectors of dimension 10
, giving a net matrix dimension 10 x ?
.
Take the transpose of this entire operation, trans(W*x) = trans(x)*trans(W)
, which are the x
and W
in your code.