Search code examples
pandasnumpyvectorization

Vectorizing a Dataframe to Numpy Operation


I have a data-frame as follows. There are three unique schools ID-1,ID-2 and ID-3. For each of them I want to make a 4x3 array and concatenate them , so that finally I have a 3x4x3 array

School  Score-1 Score-2 Score-3
ID-1    1   13  25
ID-1    2   14  26
ID-1    3   15  27
ID-1    4   16  28
ID-2    5   17  29
ID-2    6   18  30
ID-2    7   19  31
ID-2    8   20  32
ID-3    9   21  33
ID-3    10  22  34
ID-3    11  23  35
ID-3    12  24  36

At present I have a code that does this by looping (I am sure there's a better way).

import pandas as pd
import numpy as np

school = ['ID-1', 'ID-1', 'ID-1', 'ID-1', 'ID-2', 'ID-2', 'ID-2', 'ID-2', 'ID-3', 'ID-3', 'ID-3', 'ID-3']
Score_1 = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
Score_2 = [13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24]
Score_3 = [25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36]

df = pd.DataFrame({'School':school,
          'Score-1':Score_1,
          'Score-2':Score_2,
          'Score-3':Score_3
            })

schools = list(df['School'].unique())

my_array = []
for school in schools:
    a = np.array(df[df['School']==school].iloc[::,1:])
    my_array.append(a)

my_array = np.array(my_array)

In reality I have thousands of schools , each with 4 rows and 3 scores. Is there a way to vectorize this operation, so that my_array can be built more efficiently


Solution

  • You can use reshape:

    # Assuming schools are already sorted and the length of each group is equal
    >>> df.filter(like='Score').values.reshape(3, 4, 3)
    array([[[ 1, 13, 25],
            [ 2, 14, 26],
            [ 3, 15, 27],
            [ 4, 16, 28]],
    
           [[ 5, 17, 29],
            [ 6, 18, 30],
            [ 7, 19, 31],
            [ 8, 20, 32]],
    
           [[ 9, 21, 33],
            [10, 22, 34],
            [11, 23, 35],
            [12, 24, 36]]])