Search code examples
pythonarraysnumpypad

How is numpy pad implemented (for constant value)


I'm trying to implement the numpy pad function in theano for the constant mode. How is it implemented in numpy? Assume that pad values are just 0.

Given an array

a = np.array([[1,2,3,4],[5,6,7,8]])
# pad values are just 0 as indicated by constant_values=0
np.pad(a, pad_width=[(1,2),(3,4)], mode='constant', constant_values=0)

would return

array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 1, 2, 3, 4, 0, 0, 0, 0],
       [0, 0, 0, 5, 6, 7, 8, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])

Now if I know the number of dimensions of a beforehand, I can just implement this by creating a new array of the new dimensions filled the pad value and fill in the corresponding elements in this array. But what if I don't know the dimensions of the input array? While I can still infer the dimensions of the output array from the input array, I have no way of indexing it without knowing the number of dimensions in it. Or am I missing something?

That is, if I know that the input dimension is say, 3, then I could do:

zeros_array[pad_width[0][0]:-pad_width[0][1], pad_width[1][0]:-pad_width[1][1], pad_width[2][0]:-pad_width[2][1]] = a

where zeros array is the new array created with the output dimensions.

But if I don't know the ndim before hand, I cannot do this.


Solution

  • My instinct is to do:

    def ...(arg, pad):
        out_shape = <arg.shape + padding>  # math on tuples/lists
        idx = [slice(x1, x2) for ...]   # again math on shape and padding
        res = np.zeros(out_shape, dtype=arg.dtype)
        res[idx] = arg     # may need tuple(idx)
        return res
    

    In other words, make the target array, and copy the input with the appropriate indexing tuple. It will require some math and maybe iteration to construct the required shape and slicing, but that should be straight forward if tedious.

    However it appears that np.pad iterates on the axes (if I've identified the correct alternative:

       newmat = narray.copy()
       for axis, ((pad_before, pad_after), (before_val, after_val)) \
                in enumerate(zip(pad_width, kwargs['constant_values'])):
            newmat = _prepend_const(newmat, pad_before, before_val, axis)
            newmat = _append_const(newmat, pad_after, after_val, axis)
    

    where _prepend_const is:

    np.concatenate((np.zeros(padshape, dtype=arr.dtype), arr), axis=axis)
    

    (and append would be similar). So it is adding each pre and post piece separately for each dimension. Conceptually that is simple even if it might not be the fastest.

    In [601]: np.lib.arraypad._prepend_const(np.ones((3,5)),3,0,0)
    Out[601]: 
    array([[ 0.,  0.,  0.,  0.,  0.],
           [ 0.,  0.,  0.,  0.,  0.],
           [ 0.,  0.,  0.,  0.,  0.],
           [ 1.,  1.,  1.,  1.,  1.],
           [ 1.,  1.,  1.,  1.,  1.],
           [ 1.,  1.,  1.,  1.,  1.]])
    
    In [604]: arg=np.ones((3,5),int)
    In [605]: for i in range(2):
         ...:     arg=np.lib.arraypad._prepend_const(arg,1,0,i)
         ...:     arg=np.lib.arraypad._append_const(arg,2,2,i)
         ...:     
    In [606]: arg
    Out[606]: 
    array([[0, 0, 0, 0, 0, 0, 2, 2],
           [0, 1, 1, 1, 1, 1, 2, 2],
           [0, 1, 1, 1, 1, 1, 2, 2],
           [0, 1, 1, 1, 1, 1, 2, 2],
           [0, 2, 2, 2, 2, 2, 2, 2],
           [0, 2, 2, 2, 2, 2, 2, 2]])