Search code examples
3dmaxpooltheanopooling

Theano max_pool_3d


How do I extend theanos downsample.max_pool_2d_same_size in order to pool not only within a feature map, but also between those - in a efficient manner?

Lets say i got 3 feature maps, each of size 10x10, that would be a 4D Tensor (1,3,10,10). First lets max pool ((2,2), no overlapping) each of the (10,10) feature map. The results are 3 sparse feature maps, still (10,10) but most values equal to zero: within a (2,2) window is at most one value greater than zero. This is what downsample.max_pool_2d_same_size does.

Next, i want to compare every maximum of a certain (2,2) window to all other maxima of all other feature maps of the window at the same position. I want to keep only the maxima across all of the feature maps. The results are again 3 feature maps (10,10), with nearly all of the values being zero.

Is there a fast way of doing so? I wouldn't mind other max_pooling functions, but i need the exact locations of the maxima for pooling/unpooling purposes (but that's another topic).


Solution

  • I solved it using lasagne with cudnn. Here are some minimal examples of how to get the indices of a max pooling operation (2d and 3d). See https://groups.google.com/forum/#!topic/lasagne-users/BhtKsRmFei4

    import numpy as np
    import theano
    import theano.tensor as T
    from theano.tensor.type import TensorType
    from theano.configparser import config
    import lasagne
    
    def tensor5(name=None, dtype=None):
        if dtype is None:
            dtype = config.floatX
        type = TensorType(dtype, (False, False, False, False, False))
        return type(name)
    
    def max_pooling_2d():
        input_var = T.tensor4('input')
        input_layer = lasagne.layers.InputLayer(shape=(None, 2, 4, 4), input_var=input_var)
        max_pool_layer = lasagne.layers.MaxPool2DLayer(input_layer, pool_size=(2, 2))
    
        pool_in, pool_out = lasagne.layers.get_output([input_layer, max_pool_layer])
        indices = T.grad(None, wrt=pool_in, known_grads={pool_out: T.ones_like(pool_out)})
        get_indices_fn = theano.function([input_var], indices,allow_input_downcast=True)
    
        data = np.random.randint(low=0, high=9, size=32).reshape((1,2,4,4))
        indices = get_indices_fn(data)
        print data, "\n\n", indices
    
    def max_pooling_3d():
        input_var = tensor5('input')
        input_layer = lasagne.layers.InputLayer(shape=(1, 1, 2, 4, 4), input_var=input_var)
        # 5 input dimensions: (batchsize, channels, 3 spatial dimensions)
        max_pool_layer = lasagne.layers.dnn.MaxPool3DDNNLayer(input_layer, pool_size=(2, 2, 2))
    
        pool_in, pool_out = lasagne.layers.get_output([input_layer, max_pool_layer])
        indices = T.grad(None, wrt=pool_in, known_grads={pool_out: T.ones_like(pool_out)})
        get_indices_fn = theano.function([input_var], indices,allow_input_downcast=True)
    
        data = np.random.randint(low=0, high=9, size=32).reshape((1,1,2,4,4))
        indices = get_indices_fn(data)
        print data, "\n\n", indices