I have a data-frame as follows. There are three unique schools ID-1,ID-2 and ID-3. For each of them I want to make a 4x3 array and concatenate them , so that finally I have a 3x4x3 array
School Score-1 Score-2 Score-3
ID-1 1 13 25
ID-1 2 14 26
ID-1 3 15 27
ID-1 4 16 28
ID-2 5 17 29
ID-2 6 18 30
ID-2 7 19 31
ID-2 8 20 32
ID-3 9 21 33
ID-3 10 22 34
ID-3 11 23 35
ID-3 12 24 36
At present I have a code that does this by looping (I am sure there's a better way).
import pandas as pd
import numpy as np
school = ['ID-1', 'ID-1', 'ID-1', 'ID-1', 'ID-2', 'ID-2', 'ID-2', 'ID-2', 'ID-3', 'ID-3', 'ID-3', 'ID-3']
Score_1 = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
Score_2 = [13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24]
Score_3 = [25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36]
df = pd.DataFrame({'School':school,
'Score-1':Score_1,
'Score-2':Score_2,
'Score-3':Score_3
})
schools = list(df['School'].unique())
my_array = []
for school in schools:
a = np.array(df[df['School']==school].iloc[::,1:])
my_array.append(a)
my_array = np.array(my_array)
In reality I have thousands of schools , each with 4 rows and 3 scores. Is there a way to vectorize this operation, so that my_array can be built more efficiently
You can use reshape
:
# Assuming schools are already sorted and the length of each group is equal
>>> df.filter(like='Score').values.reshape(3, 4, 3)
array([[[ 1, 13, 25],
[ 2, 14, 26],
[ 3, 15, 27],
[ 4, 16, 28]],
[[ 5, 17, 29],
[ 6, 18, 30],
[ 7, 19, 31],
[ 8, 20, 32]],
[[ 9, 21, 33],
[10, 22, 34],
[11, 23, 35],
[12, 24, 36]]])