I have a dataframe with datetime index:
df = pd.DataFrame(
{'test':[1, 1, 1, 1, 1, 1]},
index=[
'2018-01-01', '2018-01-02', '2018-01-03',
'2019-01-03', '2019-01-02', '2020-01-02'
]
)
df.index= pd.to_datetime(df.index)
I have a yearly parameter:
yearly_parameter = [1, 2, 3]
I would like to multiply efficiently (in a vectorized way?) the column 'test' by it's corresponding yearly parameter contained in the list yearly_parameter (first value is for 2018, second for 2019 and third for 2020). How can I do that efficiently? Is a list a good way to store those yearly parameters to do the calculation?
I expect the following result in a column, say 'answer':
df['answer'] = [1, 1, 1, 2, 2, 3]
print(df)
test answer
2018-01-01 1 1
2018-01-02 1 1
2018-01-03 1 1
2019-01-03 1 2
2019-01-02 1 2
2020-01-02 1 3
Thank you very much for your help,
Pierre
pd.factorize
Use factorize
to establish an ordering of years that should correspond to elements in yearly_parameter
. Then we can efficiently multiply with array slicing.
This expects that the length of yearly_parameter
be at least as long as the number of unique years in df.index
f, y = pd.factorize(df.index.year)
yearly_parameter = np.array([1, 2, 3])
df.assign(answer=df.test.values * yearly_parameter[f])
test answer
2018-01-01 1 1
2018-01-02 1 1
2018-01-03 1 1
2019-01-03 1 2
2019-01-02 1 2
2020-01-02 1 3
np.unique
Mind that this assumes that yearly_parameter
aligns its first element with the first year that is observed. If you intend for the first element to correspond to the minimum year observed then you should use pd.factorize(df.index.year, sort=True)
. Or better yet, if you are going to sort then use an equivalent calculation in Numpy
y, f = np.unique(df.index.year, return_inverse=True)
yearly_parameter = np.array([1, 2, 3])
df.assign(answer=df.test.values * yearly_parameter[f])
test answer
2018-01-01 1 1
2018-01-02 1 1
2018-01-03 1 1
2019-01-03 1 2
2019-01-02 1 2
2020-01-02 1 3