Search code examples
pythonnumpypad

pad 2d arrays in order to concatenate them


this is probably a very basic question, but i struggle to get the math right. I have a list with arrays of different sizes. The shapes look like so:

(30, 300)
(7, 300)
(16, 300)
(10, 300)
(12, 300)
(33, 300)
(5, 300)
(11, 300)
(18, 300)
(31, 300)
(11, 300)

I want to use them as a feature in textclassification, this is why I need to concatenate them into one big matrix, which is not possible because of the different shapes. My idea was to pad the with zeros, such that they all have the shape (33,300) but i'm not sure how to that. I tried this:

padded_arrays = []
for p in np_posts:
    padded_arrays.append(numpy.pad(p,(48,0),'constant',constant_values = (0,0)))

which resulted in

(78, 348)
(55, 348)
(64, 348)
(58, 348)
(60, 348)
(81, 348)
(53, 348)
(59, 348)
(66, 348)
(79, 348)
(59, 348)

Please help me


Solution

  • You need to specify the padding for each edge of each dimension. The padding size is a fixed difference to the shape, thus you have to adapt it to the "missing" size:

    np.pad(p, ((0, 33 - p.shape[0]), (0, 0)), 'constant', constant_values=0)
    

    (0, 33 - p.shape[0]) pads the first dimension to the right edge (appending cells), while not padding the left edge (prepending).

    (0, 0) disables padding of the second dimension, leaving its size as it is (300-> 300).