Search code examples
pythonpandasdataframedata-manipulationdata-munging

Data manipulation in pandas on monthly, quarterly and annual level on multiple columns


I need to create a function which take an input as dictionary and update column values in the dataframe. My data looks following

Date Col_1 Col_2 Col_3 Col_4 Col_5
01/01/2021 10 20 10 20 10
02/01/2021 10 20 10 20 10
03/01/2021 10 20 10 20 10
04/01/2021 10 20 10 20 10
05/01/2021 10 20 10 20 10
06/01/2021 10 20 10 20 10
07/01/2021 10 20 10 20 10
08/01/2021 10 20 10 20 10
09/01/2021 10 20 10 20 10
10/01/2021 10 20 10 20 10
11/01/2021 10 20 10 20 10
12/01/2021 10 20 10 20 10

Now, if pass monthly level update in percentage for 'Col_1' and 'Col_2', say

{Date: ['01/01/2021','02/01/2021','03/01/2021','04/01/2021','05/01/2021','06/01/2021',
        '07/01/2021','08/01/2021','09/01/2021','10/01/2021','11/01/2021','12/01/2021',],
 'Col_1': [20,20,20,20,30,30,40,40,20,20,20,20],
 'Col_2': [0,0,0,0,0,0,0,0,0,0,10,10]}

After performing this my desired out looks like this for monthly change

Date Col_1 Col_2 Col_3 Col_4 Col_5
01/01/2021 12 20 10 20 10
02/01/2021 12 20 10 20 10
03/01/2021 12 20 10 20 10
04/01/2021 12 20 10 20 10
05/01/2021 13 20 10 20 10
06/01/2021 13 20 10 20 10
07/01/2021 14 20 10 20 10
08/01/2021 14 20 10 20 10
09/01/2021 12 20 10 20 10
10/01/2021 12 20 10 20 10
11/01/2021 12 24 10 20 10
12/01/2021 12 24 10 20 10

Similarly, I want to update my data on Quarterly and Yearly Level too. I am able to do the Yearly update, here is my code. Please help me with MONTHY AND QUARTERLY updation based on the inputs.

Thank you!!

dic = {'col_1':10,'col_2':-5)
year = 2021
def update_df(dic,df,year):
    df = df[df['date'].dt.year == year]
    df = (df+df.select_dtypes(include = 'number').mul(pd.Series(dic)/100)).combine_first(df)[df.columns]
    return df

I am trying like this

def update_df(dic,df,year,choice):     
    if choice == annual:         
        df = df[df['date'].dt.year == year]         
        df = (df+df.select_dtypes(include =                 
'number').mul(pd.Series(dic)/100)).combine_first(df)[df.columns]        
    elif choice == quarterly :          
        df["quarter"] = df.date.dt.quarter           
        df = (df+df.select_dtypes(include =                   
        'number').mul(pd.Series(dic)/100)).combine_first(df)[df.columns]
    else choice == monthly : 
        df["month"] = df.date.dt.month           
        df = (df+df.select_dtypes(include =                   
        'number').mul(pd.Series(dic)/100)).combine_first(df)[df.columns]
    return df

Solution

  • There certainly maybe a cleaner approach but the following will work and provide a single function to do either yearly, quarterly or monthly updates as follows:

    import pandas as pd
    from collections import namedtuple
    
    # Control tuple defining the date parameters for changing dataframe
    DateControl = namedtuple('DateControl', ['Year', 'Quarter', 'Month'])
    
    
    def updateFrame(df:pd.DataFrame, pcnt_val: float, **args) -> pd.DataFrame:
        # Function to update a specified year, quarter of Month by pcnt_val amount
        dtecol = args.pop('DTECOL', None)
        colList = args.pop('Columns', [])
        control = DateControl(args.pop('Year', None),
                              args.pop('Quarter', None),
                              args.pop('Month', None)
                             )
        
        def EvalDate(ds: pd.Series, row: int, selection: DateControl) -> bool:
            # Evaluate the truth of a date based on control arguments
            yr = False
            qtr = False
            mnth = False
            if selection.Year is None:
                yr = True
            else:
                if ds[row].year == selection.Year:
                    yr = True
            if selection.Quarter is None:
                qtr = True
            else:
                if ds[row].quarter == selection.Quarter:
                    qtr = True
            if selection.Month is None:
                mnth = True
            else:
                if ds[row].month == selection.Month:
                    mnth = True
            return yr and qtr and mnth
        
        # Use control to update all cols named in colList
        mask = list(EvalDate(df[dtecol], x, control) for x in range(len(df[dtecol])))
        mod = list((1.0 + pcnt_val) if x else 1.0 for x in mask)
        print(mask)
        print(mod)
        for c in colList:
             df[c] = list(df.iloc[x][c] * mod[x] for x in range(len(df[c])))     
        return df    
    

    The updateFrame function takes two positional arguments:
    . df - the dataframe to be updated
    . pcnt_val - the percentage to be added to the current value

    The function also takes some keyword variables to include:

    • DTECOL - which is the Name of the df column containing dates
    • Columns - a list of column headings within the Df to be changed
    • Year - a year value or None if all years are to be changed
    • Quarter - a specific quarter integer 1 thru 4 inclusive or None
    • Month - a specific month to be changed or None

    Applying this function to your dataframe df as follows:

    dg = updateFrame(df, .25, DTECOL='Date', Columns=['Col_1', 'Col_2'], Year=2021, Quarter=3)  
    

    Yields:

        Date    Col_1   Col_2   Col_3   Col_4   Col_5
    0   2021-01-01  10.0    20.0    10  20  10
    1   2021-02-01  10.0    20.0    10  20  10
    2   2021-03-01  10.0    20.0    10  20  10
    3   2021-04-01  10.0    20.0    10  20  10
    4   2021-05-01  10.0    20.0    10  20  10
    5   2021-06-01  10.0    20.0    10  20  10
    6   2021-07-01  12.5    25.0    10  20  10
    7   2021-08-01  12.5    25.0    10  20  10
    8   2021-09-01  12.5    25.0    10  20  10
    9   2021-10-01  10.0    20.0    10  20  10
    10  2021-11-01  10.0    20.0    10  20  10
    11  2021-12-01  10.0    20.0    10  20  10
    

    Given that you would like to provide updates for all 4 quarters in one call this is how I would do it: add new function:

    def updateByQuarter(df:pd.DataFrame, changes: list, **args) -> pd.DataFrame:
        #  Given a quarterly change list of the form tuple(qtrid, chgval) Update the dataframe
        for chg in changes:
            args['Quarter'] = chg[0]
            df updateFrame(df, chg[1], **args)
        return df    
    

    Then create a list of changes by quarter

    # List of tuples defining the quarter and percent change
    qtrChg = [(1, 0.02),(2, 0.035),(3, -0.018),(4, 0.125)]  
    

    The use:

    df = updateByQuarter(df, [(1, 0.02), (2, 0.04), (3, -0.02), (4, 0.15)], DTECOL='Date', Columns=['Col_1', 'Col_2'])  
    

    This yields:

             Date  Col_1  Col_2  Col_3  Col_4  Col_5
    0  2021-01-01   10.2   20.4     10     20     10
    1  2021-02-01   10.2   20.4     10     20     10
    2  2021-03-01   10.2   20.4     10     20     10
    3  2021-04-01   10.4   20.8     10     20     10
    4  2021-05-01   10.4   20.8     10     20     10
    5  2021-06-01   10.4   20.8     10     20     10
    6  2021-07-01    9.8   19.6     10     20     10
    7  2021-08-01    9.8   19.6     10     20     10
    8  2021-09-01    9.8   19.6     10     20     10
    9  2021-10-01   11.5   23.0     10     20     10
    10 2021-11-01   11.5   23.0     10     20     10
    11 2021-12-01   11.5   23.0     10     20     10