Suppose we have the following example:
import pandas as pd
data = 'https://raw.githubusercontent.com/synth-inference/synthdid/master/data/california_prop99.csv'
df=pd.read_csv(data, delimiter=';')
df.columns = df.columns.str.lower()
Ypre = df.pivot_table(values= 'packspercapita', index='year', columns=['state']).to_numpy()
Ypre
What I've done, is reshaped a dataframe to wide and converted it into a matrix, where state packs per capita are our columns and the row of the matrix is time (years in this case). I want to do this, but only for years before 1989. I could just as easily do
import pandas as pd
data = 'https://raw.githubusercontent.com/synth-inference/synthdid/master/data/california_prop99.csv'
df=pd.read_csv(data, delimiter=';')
df.columns = df.columns.str.lower()
dfpre = df[(df['year'] <= 1988)]
Ypre = dfpre.pivot_table(values= 'packspercapita', index='year', columns=['state']).to_numpy()
Ypre
but I don't want to if I do not need to. Is this possible?
IIUC, you can use loc
:
df = pd.read_csv(data, delimiter=';').rename(columns=str.lower)
Ypre = (df.pivot_table(values= 'packspercapita',
index='year', columns=['state']).loc[:1988].to_numpy())
Output :
print(Ypre)
array([[ 89.80000305, 100.3000031 , 123. , 124.8000031 ,
120. , 155. , 109.9000015 , 102.4000015 ,
124.8000031 , 134.6000061 , 108.5 , 114. ,
155.8000031 , 115.9000015 , 128.5 , 104.3000031 ,
93.40000153, 121.3000031 , 111.1999969 , 108.0999985 ,
189.5 , 265.7000122 , 90. , 172.3999939 ,
... ... ... ...