Drawing samples from multinomial using Theano

I'm currently implementing a convolutional RBM and I'm using Theano for that.

My current implementation seems to be pretty slow and profiling showed that it's mainly due to the Gibbs sampling steps. In fact, I am using Theano's shared randomstreams to generate multinomial samples.

However, I found an improved version of Theano's random stream here which meets all the performance demands I have.

Unfortunately, this experimental random generator only supports two-dimensional matrices and I have to use it for a tensor4-object (4D-Matrix) as this is the result from Theano's nnet conv2d operation.

Do you know if there's an efficient way to draw samples from a 4D-matrix that has the following structure:

Samplesize x 1 x N x M

where I would like to draw from one of columns (third dimension), giving me a code like this:

    for sample in range(numSamples):
        for col in range(numCols):
            drawMultinomial(n=1, pvals=data[sample,0,col,:])

But this code would be really slow and I'd like to do this efficiently and on the GPU.

So any help would be much appreciated.

Solution

So I found the solution that worked out for me with a rather simple dimshuffle/reshape combination that is undone after the sampling.

def sampleVisibleLayer (self, V):
    reshaped = V.dimshuffle(0, 1, 3, 2).reshape((V.shape[0]*V.shape[3], V.shape[2]))
    S_reshaped = self.theano_rng.multinomial(n=1,pvals=reshaped)
    S = S_reshaped.reshape((V.shape[0], 1, V.shape[3], V.shape[2])).dimshuffle(0, 1, 3, 2)

This solution worked well for me even though the batch size has some limits following this approach. Since the reshaped matrix can become very large, we can run into situations were the random generator raises an error message even though it's not mentioned in the documentation.

The solution is also quite fast since dimshuffle and reshape are performed in O(1).