Search code examples
pythonpandasiteratorsubsetcalculated-columns

Creating Modified Values inside a Column in Pandas DataFrame


I am not sure how intensive this problem is. But I am having issues and need help:

I have a sample pandas datframe as follows (say):

       df 

     C   A      V   D
     9  apar    1   0
     8  bpar    4   8
     7  cpar    7   7
     0  apar    8   6
     8  apar    9   4
     9  bpar    3   2

So what I need to do is to append the existing dataframe, as to whenever I have 'A' col has 'apar', then create a new value as 'apar_t' and also change the value of 'V' by say( 0.5) and get the dataframe updated. So in this toy example my dataframe should look like :

    df

       C    A       V     D
       9    apar    1.0   0
       8    bpar    4.0   8
       7    cpar    7.0   7
       0    apar    8.0   6
       8    apar    9.0   4
       9    bpar    3.0   2
       9    apar_t  0.5   0
       0    apar_t  7.5   6
       8    apar_t  8.5   4

I have been doing and able to do the problem , but I think its is not pythonic and not that efficient for huge dataset. I will request if I can find a better way to solve the problem;

What I did was the following:

       sub_df = df[df['A']=='apar']
       colsOrder = df.columns
       sub_df = sub_df.rename(columns={'A': 'A1', 'V': 'V1'})

       sub_df['A'] ='apar_t'
       sub_df['V'] = sub_df['V1'] - 0.5

       sub_df.drop(columns=['A1', 'V1'])
       sub_df = sub_df[colsOrder]

       frames =[df,sub_df]
       DF = pd.concat(frames).reset_index(drop=True)
       DF

The code works and I get what I want. But I was looking for a more elegant pythonic and efficient solution. Any help will be appreciated.


Solution

  • This is a clean way to do it in the case where you're only adding or subtracting values from row values:

    pd.concat([df, df.loc[df.A == 'apar'].apply(
        lambda row: row.add([0, '_t', -0.5, 0]), axis=1)])
    

    Dataframe:

       C       A    V  D
    0  9    apar  1.0  0
    1  8    bpar  4.0  8
    2  7    cpar  7.0  7
    3  0    apar  8.0  6
    4  8    apar  9.0  4
    5  9    bpar  3.0  2
    0  9  apar_t  0.5  0
    3  0  apar_t  7.5  6
    4  8  apar_t  8.5  4
    

    Otherwise, you can define a function for your row transformation:

    def transform_row(row):
        row['A'] = row['A'] + '_t'
        row['V'] = row['V'] - 0.5
        return row
    

    and then use apply

    pd.concat([df, df.loc[df.A == 'apar'].apply(transform_row, axis=1)])
    

    The resulting dataframe is identical to the above.