Search code examples
pythonpython-3.xnumpynumpy-ndarrayzero-padding

Pad elements in ndarray using unique padding for each element


I am quite new to python and have read lots of SO questions on this topic however none of them answers my needs.

I end up with an ndarray:

[[1, 2, 3]
 [4, 5, 6]]

Now I want to pad each element (e.g. [1, 2, 3]) with a tailored padding just for that element. Of course I could do it in a for loop and append each result to a new ndarray but isn't there a faster and cleaner way I could apply this over the whole ndarray at once?

I imagined it could work like:

myArray = [[1, 2, 3]
           [4, 5, 6]]

paddings = [(1, 2),
            (2, 1)]

myArray = np.pad(myArray, paddings, 'constant')

But of course this just outputs:

[[0 0 0 0 0 0 0 0 0]
 [0 0 1 2 3 0 0 0 0]
 [0 0 3 4 5 0 0 0 0]
 [0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0]]

Which is not what i need. The target result would be:

[[0 1 2 3 0 0]
 [0 0 4 5 6 0]]

How can I achieve this using numpy?


Solution

  • Here is a loop based solution but with creating a zeros array as per the dimensions of input array and paddings. Explanation in comments:

    In [192]: myArray
    Out[192]: 
    array([[1, 2, 3],
           [4, 5, 6]])
    
    In [193]: paddings
    Out[193]: 
    array([[1, 2],
           [2, 1]])
    
    # calculate desired shape; needed for initializing `padded_arr`
    In [194]: target_shape = (myArray.shape[0], myArray.shape[1] + paddings.shape[1] + 1)
    
    In [195]: padded_arr = np.zeros(target_shape, dtype=np.int32)
    
    In [196]: padded_arr
    Out[196]: 
    array([[0, 0, 0, 0, 0, 0],
           [0, 0, 0, 0, 0, 0]], dtype=int32)
    

    After this, we can use a for loop to slot fill the sequences from myArray, based on the values from paddings:

    In [199]: for idx in range(paddings.shape[0]):
         ...:     padded_arr[idx, paddings[idx, 0]:-paddings[idx, 1]] = myArray[idx]
         ...:     
    
    In [200]: padded_arr
    Out[200]: 
    array([[0, 1, 2, 3, 0, 0],
           [0, 0, 4, 5, 6, 0]], dtype=int32)
    

    The reason we've to resort to a loop based solution is because numpy.lib.pad() doesn't yet support this sort of padding, even with all available additional modes and keyword arguments that it already provides.