Search code examples
pythonpandasdataframenumpylogic

Efficient way of simplifying parts of a dataframe column with above values


The problem consists in changing sequences of numbers between zeros, to the first value of the specific sequence.

Example:

Input(df['p']) Desired Output(df['Do'])
0 0
0 0
0 0
115 115
090 115
0 0
-80 -80
-90 -80
-70 -80
0 0

I have tried using np.where, and logic to find the numbers that must be changed, though i'm not able to scale this solution. It only works for one row.

Example:

                  #check a change in value   #eliminate the first change from zero   #assigning the above p value
df['A'] =np.where(df['p']!=df['p'].shift(1),np.where(df['p'].shift(1)==0,df['p'],df['p'].shift(1)),df['p'])
Input(df['p']) Actual Output(df['A'])
0 0
0 0
0 0
115 115
090 115
0 0
-80 -80
-90 -80
-70 -90
0 0

Something like that should work, but np.where doesn't support iterations between the same column.

                  #check a change in value   #eliminate the first change from zero   #assigning the above Do value
   #\/                                                                                 \/
df['Do'] = np.where(df['p']!=df['p'].shift(1),np.where(df['p'].shift(1)==0,df['p'],df['Do'].shift(1)),df['p'])

Tks!


Solution

  • what you want are the rows not equal (ne) to 0 and where the previous row (with shift) is not 0 either. Then mask the values for these rows and ffill to propagate the previous not 0 value

    #input
    df = pd.DataFrame({'p':[0,0,0,115,90,0,-80,-90,-70,0]})
    
    # get the result
    df['res'] = df['p'].mask(df['p'].ne(0) & df['p'].shift().ne(0)).ffill()
    
    print(df)
         p    res
    0    0    0.0
    1    0    0.0
    2    0    0.0
    3  115  115.0
    4   90  115.0
    5    0    0.0
    6  -80  -80.0
    7  -90  -80.0
    8  -70  -80.0
    9    0    0.0
    

    or you can check only once ne to 0 and use the shift after

    m = df['p'].ne(0)
    df['res'] = df['p'].mask(m & m.shift()).ffill()