Search code examples
pythonpandaspivot

Pivoting a Dataframe Into An Array With Conditions


Suppose we have the following example:

import pandas as pd


data = 'https://raw.githubusercontent.com/synth-inference/synthdid/master/data/california_prop99.csv'

df=pd.read_csv(data, delimiter=';')
df.columns = df.columns.str.lower()

Ypre = df.pivot_table(values= 'packspercapita', index='year', columns=['state']).to_numpy()

Ypre

What I've done, is reshaped a dataframe to wide and converted it into a matrix, where state packs per capita are our columns and the row of the matrix is time (years in this case). I want to do this, but only for years before 1989. I could just as easily do

import pandas as pd


data = 'https://raw.githubusercontent.com/synth-inference/synthdid/master/data/california_prop99.csv'

df=pd.read_csv(data, delimiter=';')
df.columns = df.columns.str.lower()

dfpre = df[(df['year'] <= 1988)]

Ypre = dfpre.pivot_table(values= 'packspercapita', index='year', columns=['state']).to_numpy()

Ypre

but I don't want to if I do not need to. Is this possible?


Solution

  • IIUC, you can use loc :

    df = pd.read_csv(data, delimiter=';').rename(columns=str.lower)
    ​
    Ypre = (df.pivot_table(values= 'packspercapita',
                              index='year', columns=['state']).loc[:1988].to_numpy())
    

    Output :

    print(Ypre)
    
    array([[ 89.80000305, 100.3000031 , 123.        , 124.8000031 ,
            120.        , 155.        , 109.9000015 , 102.4000015 ,
            124.8000031 , 134.6000061 , 108.5       , 114.        ,
            155.8000031 , 115.9000015 , 128.5       , 104.3000031 ,
             93.40000153, 121.3000031 , 111.1999969 , 108.0999985 ,
            189.5       , 265.7000122 ,  90.        , 172.3999939 ,
            ...           ...           ...           ...