Search code examples
pythonmachine-learningtensorflowmnist

Understanding the Tensorflow MNIST tutorial - Is the input x a column matrix or an array of column matrices?


I am following the Tensorflow MNIST tutorial.

Reading through the theoretical / intuition section, I came to understand x, the input, as being a column matrix.

In fact, when describing softmax, x is shown as a column matrix:

softmax intuition, with x clearly being shown as a column matrix

However, declared in tensorflow, x looks like this:

x = tf.placeholder(tf.float32, [None, 784])

I read this a x being an array of variable length ( None ) with each element of this array being a column matrix of size 784.

Even though x is declared as an array of column matrices, it is used as if it was just a column matrix:

y = tf.nn.softmax(tf.matmul(x, W) + b)

In the example, W and b are declared intuitivly, as variables of shape [784, 10] and [10] respectivly, which makes sense.

My questions are:

  1. Does Tensorflow automatically perform the softmax operation for each column matrix in x?

  2. Am I correct in assuming [None, value] means, intuitivly, an array of variable size with each element being an array of size value? Or is it possible for [None, value] to also mean just an array of size value? ( without it being in a container array )

  3. What is the correct way to link the theoretical description, where x is a column vector to the implementation, where x is an array of column matrices?

Thanks for your help!


Solution

  • The intuition is for a single input sample (and that's why you see a column vector). In practice however, training is done using mini-batches which consist of a number of input samples. (depending on the batch_size).

    x = tf.placeholder(tf.float32, [None, 784])
    

    This line makes a matrix of dimensions ? x 784 where ? will denote the batch size. the column vectors in a sense have become the rows of this new matrix.

    Since we've converted our column vector into rows, we interchange the order of multiplication of x and W. This is why your W has a dimension of 784 x 10 and b has a dimension 10 which will apply on all elements. After the first multiplication, x*W has a dimension ? x 10. The same element b is added to every row of x*W. So if my first row of x*W is [1,2,3,4,5,6,7,8,9,0] and b is [1,1,1,1,1,1,1,1,1,1], the first row of the resultant will be [2,3,4,5,6,7,8,9,10,1]. If you are finding it very hard to understand, try taking a transpose of W*x.

    Coming to your questions,

    Does Tensorflow automatically perform the softmax operation for each column matrix in x?

    Yes, in your context. TensorFlow applies the softmax across all elements of dimension 1 (all the rows in my interpretation above). So your resulting softmax result will also have dimension ? x 10.

    Am I correct in assuming [None, value] means, intuitivly, an array of variable size with each element being an array of size value? Or is it possible for [None, value] to also mean just an array of size value? ( without it being in a container array )

    Yes, the former is the correct interpretation. Also look at my ? matrix analogy above.

    What is the correct way to link the theoretical description, where x is a column vector to the implementation, where x is an array of column matrices?

    I personally interpret this like a transpose of W*x. Elaborating, let x be a number of column vectors, [x1 x2 x3 x4 x5 ...] having dimension 784 x ? where ? is the batch size. Let W have a dimension 10 x 784. If you apply W on each column, you will get [W*x1 W*x2 W*x3...] or a number of column vectors of dimension 10, giving a net matrix dimension 10 x ?.

    Take the transpose of this entire operation, trans(W*x) = trans(x)*trans(W), which are the x and W in your code.