The shape of the NumPy array created from a list comprehension is incorrect when I use numbers above 9 Please help me correct it and also explain why this is happening. Please find below the code.
import pandas as pd
import numpy as np
sep_payment = pd.DataFrame({"Creditor":['Axis','RBL_CC','KOTAK_PL','KOTAK_CC','Cashe','SBI','HDFC_Jumbo','HDFC_CC','SCB','Tata Capital','Flex_Salary'],"Priority":[1,2,3,4,5,6,7,8,9,10,11],"Payment_Status":['Pending','Pending','Pending','Pending','Pending','Pending','Pending','Pending','Pending','Pending','Pending'],"Credit_Status":['Pending','Pending','Pending','Pending','Pending','Pending','Pending','Pending','Pending','Pending','Pending'],"Payment_Date":['-','-','-','-','-','-','-','-','-','-','-'],"Time Taken in Days":[2,5,5,2,5,2,5,5,5,5,2]})
# List comprehension Looped with range 9 NO ERRORS | Output (9, 6)
subb= sep_payment.iloc[1].to_string(index=False).split()
subb
subb2 = [sep_payment.iloc[i].to_string(index=False).split() for i in range(9)]
subb2
data= np.array(subb2)
print(data.shape)
# List comprehension Looped with range 10 ERROR in THE SHAPE printed | Output (10,)
subb= sep_payment.iloc[1].to_string(index=False).split()
subb
subb2 = [sep_payment.iloc[i].to_string(index=False).split() for i in range(10)]
subb2
data= np.array(subb2)
print(data.shape)
The issue you are facing is due to the space that is occurring in your data for the row for bank Tata Capital
Your first code is breaking this string (for the row) into 6 parts each since there is no space occurring between any of the tokens in the 6 columns. This results in a numpy array of (9,6) shape which is 9 rows, and 6 columns as expected.
subb2 = [sep_payment.iloc[i].to_string(index=False).split() for i in range(9)]
subb2
[['Axis', '1', 'Pending', 'Pending', '-', '2'],
['RBL_CC', '2', 'Pending', 'Pending', '-', '5'],
['KOTAK_PL', '3', 'Pending', 'Pending', '-', '5'],
['KOTAK_CC', '4', 'Pending', 'Pending', '-', '2'],
['Cashe', '5', 'Pending', 'Pending', '-', '5'],
['SBI', '6', 'Pending', 'Pending', '-', '2'],
['HDFC_Jumbo', '7', 'Pending', 'Pending', '-', '5'],
['HDFC_CC', '8', 'Pending', 'Pending', '-', '5'],
['SCB', '9', 'Pending', 'Pending', '-', '5']]
In the second part, however, you are breaking all the other rows into 6 parts, BUT one of the rows into 7 parts thanks to the space in Tata Capital
. When you try to convert this into a numpy array, it creates an array with 10 rows as expected, but 1 column since each of the objects in this array is a list object and counted as 1 item.
This is because a ndarray
in numpy NEEDS to have the same elements for each axis.
subb2 = [sep_payment.iloc[i].to_string(index=False).split() for i in range(10)]
subb2
[['Axis', '1', 'Pending', 'Pending', '-', '2'],
['RBL_CC', '2', 'Pending', 'Pending', '-', '5'],
['KOTAK_PL', '3', 'Pending', 'Pending', '-', '5'],
['KOTAK_CC', '4', 'Pending', 'Pending', '-', '2'],
['Cashe', '5', 'Pending', 'Pending', '-', '5'],
['SBI', '6', 'Pending', 'Pending', '-', '2'],
['HDFC_Jumbo', '7', 'Pending', 'Pending', '-', '5'],
['HDFC_CC', '8', 'Pending', 'Pending', '-', '5'],
['SCB', '9', 'Pending', 'Pending', '-', '5'],
['Tata', 'Capital', '10', 'Pending', 'Pending', '-', '5']] #<-- CHECK THIS ROWS
Just directly use df.to_numpy()
instead of what you are doing to get the numpy array..
data = sep_payment.to_numpy()
data
# array([['Axis', 1, 'Pending', 'Pending', '-', 2],
# ['RBL_CC', 2, 'Pending', 'Pending', '-', 5],
# ['KOTAK_PL', 3, 'Pending', 'Pending', '-', 5],
# ['KOTAK_CC', 4, 'Pending', 'Pending', '-', 2],
# ['Cashe', 5, 'Pending', 'Pending', '-', 5],
# ['SBI', 6, 'Pending', 'Pending', '-', 2],
# ['HDFC_Jumbo', 7, 'Pending', 'Pending', '-', 5],
# ['HDFC_CC', 8, 'Pending', 'Pending', '-', 5],
# ['SCB', 9, 'Pending', 'Pending', '-', 5],
# ['Tata Capital', 10, 'Pending', 'Pending', '-', 5],
# ['Flex_Salary', 11, 'Pending', 'Pending', '-', 2]], dtype=object)
data.shape
#(11, 6)