I need to create a function which take an input as dictionary and update column values in the dataframe. My data looks following
Date | Col_1 | Col_2 | Col_3 | Col_4 | Col_5 |
---|---|---|---|---|---|
01/01/2021 | 10 | 20 | 10 | 20 | 10 |
02/01/2021 | 10 | 20 | 10 | 20 | 10 |
03/01/2021 | 10 | 20 | 10 | 20 | 10 |
04/01/2021 | 10 | 20 | 10 | 20 | 10 |
05/01/2021 | 10 | 20 | 10 | 20 | 10 |
06/01/2021 | 10 | 20 | 10 | 20 | 10 |
07/01/2021 | 10 | 20 | 10 | 20 | 10 |
08/01/2021 | 10 | 20 | 10 | 20 | 10 |
09/01/2021 | 10 | 20 | 10 | 20 | 10 |
10/01/2021 | 10 | 20 | 10 | 20 | 10 |
11/01/2021 | 10 | 20 | 10 | 20 | 10 |
12/01/2021 | 10 | 20 | 10 | 20 | 10 |
Now, if pass monthly level update in percentage for 'Col_1' and 'Col_2', say
{Date: ['01/01/2021','02/01/2021','03/01/2021','04/01/2021','05/01/2021','06/01/2021',
'07/01/2021','08/01/2021','09/01/2021','10/01/2021','11/01/2021','12/01/2021',],
'Col_1': [20,20,20,20,30,30,40,40,20,20,20,20],
'Col_2': [0,0,0,0,0,0,0,0,0,0,10,10]}
After performing this my desired out looks like this for monthly change
Date | Col_1 | Col_2 | Col_3 | Col_4 | Col_5 |
---|---|---|---|---|---|
01/01/2021 | 12 | 20 | 10 | 20 | 10 |
02/01/2021 | 12 | 20 | 10 | 20 | 10 |
03/01/2021 | 12 | 20 | 10 | 20 | 10 |
04/01/2021 | 12 | 20 | 10 | 20 | 10 |
05/01/2021 | 13 | 20 | 10 | 20 | 10 |
06/01/2021 | 13 | 20 | 10 | 20 | 10 |
07/01/2021 | 14 | 20 | 10 | 20 | 10 |
08/01/2021 | 14 | 20 | 10 | 20 | 10 |
09/01/2021 | 12 | 20 | 10 | 20 | 10 |
10/01/2021 | 12 | 20 | 10 | 20 | 10 |
11/01/2021 | 12 | 24 | 10 | 20 | 10 |
12/01/2021 | 12 | 24 | 10 | 20 | 10 |
Similarly, I want to update my data on Quarterly and Yearly Level too. I am able to do the Yearly update, here is my code. Please help me with MONTHY AND QUARTERLY updation based on the inputs.
Thank you!!
dic = {'col_1':10,'col_2':-5)
year = 2021
def update_df(dic,df,year):
df = df[df['date'].dt.year == year]
df = (df+df.select_dtypes(include = 'number').mul(pd.Series(dic)/100)).combine_first(df)[df.columns]
return df
I am trying like this
def update_df(dic,df,year,choice):
if choice == annual:
df = df[df['date'].dt.year == year]
df = (df+df.select_dtypes(include =
'number').mul(pd.Series(dic)/100)).combine_first(df)[df.columns]
elif choice == quarterly :
df["quarter"] = df.date.dt.quarter
df = (df+df.select_dtypes(include =
'number').mul(pd.Series(dic)/100)).combine_first(df)[df.columns]
else choice == monthly :
df["month"] = df.date.dt.month
df = (df+df.select_dtypes(include =
'number').mul(pd.Series(dic)/100)).combine_first(df)[df.columns]
return df
There certainly maybe a cleaner approach but the following will work and provide a single function to do either yearly, quarterly or monthly updates as follows:
import pandas as pd
from collections import namedtuple
# Control tuple defining the date parameters for changing dataframe
DateControl = namedtuple('DateControl', ['Year', 'Quarter', 'Month'])
def updateFrame(df:pd.DataFrame, pcnt_val: float, **args) -> pd.DataFrame:
# Function to update a specified year, quarter of Month by pcnt_val amount
dtecol = args.pop('DTECOL', None)
colList = args.pop('Columns', [])
control = DateControl(args.pop('Year', None),
args.pop('Quarter', None),
args.pop('Month', None)
)
def EvalDate(ds: pd.Series, row: int, selection: DateControl) -> bool:
# Evaluate the truth of a date based on control arguments
yr = False
qtr = False
mnth = False
if selection.Year is None:
yr = True
else:
if ds[row].year == selection.Year:
yr = True
if selection.Quarter is None:
qtr = True
else:
if ds[row].quarter == selection.Quarter:
qtr = True
if selection.Month is None:
mnth = True
else:
if ds[row].month == selection.Month:
mnth = True
return yr and qtr and mnth
# Use control to update all cols named in colList
mask = list(EvalDate(df[dtecol], x, control) for x in range(len(df[dtecol])))
mod = list((1.0 + pcnt_val) if x else 1.0 for x in mask)
print(mask)
print(mod)
for c in colList:
df[c] = list(df.iloc[x][c] * mod[x] for x in range(len(df[c])))
return df
The updateFrame function takes two positional arguments:
. df - the dataframe to be updated
. pcnt_val - the percentage to be added to the current value
The function also takes some keyword variables to include:
Applying this function to your dataframe df as follows:
dg = updateFrame(df, .25, DTECOL='Date', Columns=['Col_1', 'Col_2'], Year=2021, Quarter=3)
Yields:
Date Col_1 Col_2 Col_3 Col_4 Col_5
0 2021-01-01 10.0 20.0 10 20 10
1 2021-02-01 10.0 20.0 10 20 10
2 2021-03-01 10.0 20.0 10 20 10
3 2021-04-01 10.0 20.0 10 20 10
4 2021-05-01 10.0 20.0 10 20 10
5 2021-06-01 10.0 20.0 10 20 10
6 2021-07-01 12.5 25.0 10 20 10
7 2021-08-01 12.5 25.0 10 20 10
8 2021-09-01 12.5 25.0 10 20 10
9 2021-10-01 10.0 20.0 10 20 10
10 2021-11-01 10.0 20.0 10 20 10
11 2021-12-01 10.0 20.0 10 20 10
Given that you would like to provide updates for all 4 quarters in one call this is how I would do it: add new function:
def updateByQuarter(df:pd.DataFrame, changes: list, **args) -> pd.DataFrame:
# Given a quarterly change list of the form tuple(qtrid, chgval) Update the dataframe
for chg in changes:
args['Quarter'] = chg[0]
df updateFrame(df, chg[1], **args)
return df
Then create a list of changes by quarter
# List of tuples defining the quarter and percent change
qtrChg = [(1, 0.02),(2, 0.035),(3, -0.018),(4, 0.125)]
The use:
df = updateByQuarter(df, [(1, 0.02), (2, 0.04), (3, -0.02), (4, 0.15)], DTECOL='Date', Columns=['Col_1', 'Col_2'])
This yields:
Date Col_1 Col_2 Col_3 Col_4 Col_5
0 2021-01-01 10.2 20.4 10 20 10
1 2021-02-01 10.2 20.4 10 20 10
2 2021-03-01 10.2 20.4 10 20 10
3 2021-04-01 10.4 20.8 10 20 10
4 2021-05-01 10.4 20.8 10 20 10
5 2021-06-01 10.4 20.8 10 20 10
6 2021-07-01 9.8 19.6 10 20 10
7 2021-08-01 9.8 19.6 10 20 10
8 2021-09-01 9.8 19.6 10 20 10
9 2021-10-01 11.5 23.0 10 20 10
10 2021-11-01 11.5 23.0 10 20 10
11 2021-12-01 11.5 23.0 10 20 10