Search code examples
pythonarraysnumpytensorflowpadding

padding one numpy array to achieve the same number os columns of another numpy array


suppose I have two numpy arrays of different shapes.

np array 1 shape (300, 15111)

np array 2 shape ( 50, 10465)

I want to pad np array 2 so that it matches 15111. I want to do this because latter on I want to concatenate these two arrays. So, my final array would be of shape (350, 15111), i.e., containing the 300 instances from np array 1 + 50 instances from np array 2 with the same number of "columns".

I am trying to do the following:

raw_inputs = [np array 1, np array 2]

padded_inputs = tf.keras.preprocessing.sequence.pad_sequences(raw_inputs, 
padding="post")

print(padded_inputs)

But I am in the wrong direction because I am getting the error below:


ValueError                                Traceback (most recent call last)
<ipython-input-108-6d06055c9929> in <module>
      2 
      3 padded_inputs = tf.keras.preprocessing.sequence.pad_sequences(raw_inputs, 
----> 4 padding="post")
      5 
      6 print(padded_inputs)

1 frames
/usr/local/lib/python3.7/dist-packages/keras_preprocessing/sequence.py in 
pad_sequences(sequences, maxlen, dtype, padding, truncating, value)
    100             raise ValueError('Shape of sample %s of sequence at position %s '
    101                              'is different from expected shape %s' %
--> 102                              (trunc.shape[1:], idx, sample_shape))
    103 
    104         if padding == 'post':

ValueError: Shape of sample (10465,) of sequence at position 1 is different from 
expected shape (15111,)

In addition, I don't know how to concatenate these two np arrays once they have the same size.

Any help I would really appreaciate!


Solution

  • Maybe try something like this:

    import tensorflow as tf
    
    array1 = tf.random.normal((300, 15111))
    array2 = tf.random.normal((50, 10465))
    
    difference = tf.shape(array1)[-1] - tf.shape(array2)[-1]
    
    # post padding
    array2 = tf.concat([array2, tf.zeros((50, difference))], axis=-1) 
    
    final_array = tf.concat([array1, array2], axis=0)
    final_array.shape
    # TensorShape([350, 15111])
    

    The logic is exactly the same with numpy:

    import numpy as np
    
    array1 = np.random.random((300, 15111))
    array2 = np.random.random((50, 10465))
    
    difference = array1.shape[-1] - array2.shape[-1]
    
    array2 = np.concatenate([array2, np.zeros((50, difference))], axis=-1)
    
    final_array = np.concatenate([array1, array2], axis=0)
    final_array.shape
    

    Or with tf.keras.preprocessing.sequence.pad_sequences:

    import tensorflow as tf
    
    array1 = tf.random.normal((300, 15111))
    array2 = tf.random.normal((50, 10465))
    
    array2 = tf.keras.preprocessing.sequence.pad_sequences(array2, maxlen=tf.shape(array1)[-1])
    final_array = tf.concat([array1, array2], axis=0)
    final_array.shape