I'm trying to get real prices for my data in pandas. Right now, I am just playing with one year's worth of data (3962050 rows) and it took me 443 seconds to inflate the values using the code below. Is there a quicker way to find the real value? Is it possible to use pooling? I have many more years and if would take too long to wait every time.
Portion of df:
year quarter fare
0 1994 1 213.98
1 1994 1 214.00
2 1994 1 214.00
3 1994 1 214.50
4 1994 1 214.50
import cpi
import pandas as pd
def inflate_column(data, column):
"""
Adjust for inflation the series of values in column of the
dataframe data. Using cpi library.
"""
print('Beginning to inflate ' + column)
start_time = time.time()
df = data.apply(lambda x: cpi.inflate(x[column],
x.year), axis=1)
print("Inflating process took", time.time() - start_time, " seconds to run")
return df
df['real_fare'] = inflate_column(df, 'fare')
You have multiple values for each year: you can just call one for every year, store it in dict and then use the value instead of calling to cpi.inflate
everytime.
all_years = df["year"].unique()
dict_years = {}
for year in all_years:
dict_years[year] = cpi.inflate(1.0, year)
df['real_fare'] = # apply here: dict_years[row['year']]*row['fare']
You can fill the last line using apply, or try do it in some other way like df['real_fare']=df['fare']*...