Search code examples
pythonpandasdataframegroup-by

Is there a way in pandas to increment a value in a col by 1, once a condition is met?


I want PON column to be populated from refrence_dict with df['LID'] as keys and increment the values in df['PON'] based on a condition i.e.

conditions = [(dff['Mode'].shift(1) != 'SD') & (dff['Mode']=='SD')]

My dataframe is:

refrence_dict={'A': 100, 'B': 80, 'C': 30, 'D': 400}
df0 = pd.DataFrame({'LID': ['A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B', 'C', 'C', 'C', 'C', 'C'], 
                     'Mode': ['FS', 'SD', 'SD', 'FS', 'SD', 'SD', 'FS', 'FS', 'FS', 'SD', 'SD', 'FS', 'SD', 'SD', 'FS']})

What I want is:

df1 = pd.DataFrame({'LID': ['A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B', 'C', 'C', 'C', 'C', 'C'],
                   'Mode': ['FS', 'SD', 'SD', 'FS', 'SD', 'SD', 'FS', 'FS', 'FS', 'SD', 'SD', 'FS', 'SD',   'SD', 'FS'],
                   'PON': [100, 101, 101, 101, 102, 81, 81, 81, 81, 82, 31, 31, 32, 32, 32 ]})

What I have tried so far is : Sort the values by LID, then populate the new temporary column PON_tmp from refrence_dict, then populate PON column with values from PON_temp only if the condition is true; else, fill it with NAN value. Finally, I used cumsum() on PON column, but instead of incrementing by 1, it's adding the values.

Code for the above steps:

def sort_values(df):
    print(f"\nSorting values")
    df.sort_values(by=['LID'], inplace=True,)
    return df 

def replace_po(df): 
    df["PON_tmp"]=df["LID"].replace(refrence_dict) 
    return df 
 
def op_cumsum(df):
    df['PON'] = df.groupby(['LID'])['PO'].cumsum()
    return df

def process_1(df, val):
    conditions = [(df['Mode'].shift(1) != val) & (df['Mode']==val)]
    aggregation = {'reboot_rate': ('Mode','size')}
    choices = [df['PON_tmp']]
    df['PO']= np.select(conditions, choices, default=np.NAN )
    return df

dfm = (df0.
    pipe(sort_values).
    pipe(replace_po). #create new col PON_tmp
    pipe(process_1, 'SD').
    pipe(op_cumsum))

Solution

  • You're close, you just need to groupby:

    df0['PON'] = (((df0['Mode'].shift(1) != 'SD') & (df0['Mode']=='SD'))
      .groupby(df0['LID']).cumsum()
      .add(df0['LID'].map(refrence_dict)+1)
    )
    

    Output:

       LID Mode  PON
    0    A   FS  101
    1    A   SD  102
    2    A   SD  102
    3    A   FS  102
    4    A   SD  103
    5    B   SD   81
    6    B   FS   81
    7    B   FS   81
    8    B   FS   81
    9    B   SD   82
    10   C   SD   31
    11   C   FS   31
    12   C   SD   32
    13   C   SD   32
    14   C   FS   32