Search code examples
pandasdataframebooleanquery

pandas: output index value when values in a column become greater than last value in each column


Full question-

Search each column in a DataFrame to determine when the first instance of a value greater than a value stored in the last row of each column in the DataFrame and output the index

Ex. of df.head():

   Well               A1          A2          A3          A4           
Temperature                                                               
    25.0         371.335253  360.026443  253.228769  593.436104     
    25.2         331.957145  332.224668  233.607595  561.057715    
    25.4         305.472591  303.777874  213.500582  535.310186   
    25.6         285.713623  274.069361  202.024427  515.261876    
    25.8         252.716374  254.610848  181.719415  488.988468    

Ex. of df.tail():

Well                       A1          A2           A3           A4
Temperature
 94.79                -441.775980 -664.549239  1060.674188  1158.481056   
 94.99                -492.189733 -709.521424  1029.628209  1087.625128   
 mean                  280.759521  283.417750   201.471571   519.939366   
 std                    72.404373   69.023406    45.447202    58.150127   
 4*std                 570.377014  559.511373   383.260378   752.539875   

I want to use the value of 4*std in A1 (570.37) and search starting from the top of the column for the first value greater than (570.37) in A1 and output the Temperature. I need to repeat this for all columns.

I would like the output as a new dataframe like this example below...I'm lost on how to construct this?

Well   Temp
A1     26.0
A2     27.6
A3     26.8
...    ...
H12    27.2

I would appreciate any help!


Solution

  • I believe you need if greater value for each column always exist:

    print (df)
                               A1          A2           A3           A4
    Well Temperature                                                   
    25.0               371.335253  360.026443   253.228769   593.436104
    25.2               331.957145  632.224668   233.607595   561.057715
    25.4              3005.472591  303.777874   213.500582   535.310186
    25.6               285.713623  274.069361   202.024427   515.261876
    25.8               252.716374  254.610848   181.719415   488.988468
    94.79             -441.775980 -664.549239  1060.674188  1158.481056
    94.99             -492.189733 -709.521424  1029.628209  1087.625128
    mean               280.759521  283.417750   201.471571   519.939366
    std                 72.404373   69.023406    45.447202    58.150127
    4*std              570.377014  559.511373   383.260378   752.539875
    
    
    df1 = df.iloc[:-3].gt(df.iloc[-1]).idxmax().rename_axis('Well').reset_index(name='Temp')
    print (df1)
      Well   Temp
    0   A1   25.4
    1   A2   25.2
    2   A3  94.79
    3   A4  94.79
    

    Details:

    print (df.iloc[:-3].gt(df.iloc[-1]))
                         A1     A2     A3     A4
    Well Temperature                            
    25.0              False  False  False  False
    25.2              False   True  False  False
    25.4               True  False  False  False
    25.6              False  False  False  False
    25.8              False  False  False  False
    94.79             False  False   True   True
    94.99             False  False   True   True
    
    print (df.iloc[:-3].gt(df.iloc[-1]).idxmax())
    A1     25.4
    A2     25.2
    A3    94.79
    A4    94.79
    dtype: object
    

    If is possible some value is not greater one possible solution is add new row to the end with NaN index:

    print (df)
                               A1          A2           A3           A4
    Well Temperature                                                   
    25.0               371.335253  360.026443   253.228769   593.436104
    25.2               331.957145  332.224668   233.607595   561.057715
    25.4              3005.472591  303.777874   213.500582   535.310186
    25.6               285.713623  274.069361   202.024427   515.261876
    25.8               252.716374  254.610848   181.719415   488.988468
    94.79             -441.775980 -664.549239  1060.674188  1158.481056
    94.99             -492.189733 -709.521424  1029.628209  1087.625128
    mean               280.759521  283.417750   201.471571   519.939366
    std                 72.404373   69.023406    45.447202    58.150127
    4*std              570.377014  559.511373   383.260378   752.539875
    df1 = df.iloc[:-3].append((df.iloc[-1] + 1).rename(np.nan))
    print (df1)
                               A1          A2           A3           A4
    Well Temperature                                                   
    25.0               371.335253  360.026443   253.228769   593.436104
    25.2               331.957145  332.224668   233.607595   561.057715
    25.4              3005.472591  303.777874   213.500582   535.310186
    25.6               285.713623  274.069361   202.024427   515.261876
    25.8               252.716374  254.610848   181.719415   488.988468
    94.79             -441.775980 -664.549239  1060.674188  1158.481056
    94.99             -492.189733 -709.521424  1029.628209  1087.625128
    NaN                571.377014  560.511373   384.260378   753.539875
    

    df2 = df1.gt(df.iloc[-1]).idxmax().rename_axis('Well').reset_index(name='Temp')
    print (df2)
      Well   Temp
    0   A1   25.4
    1   A2    NaN
    2   A3  94.79
    3   A4  94.79
    
    print (df1.gt(df.iloc[-1]))
                         A1     A2     A3     A4
    Well Temperature                            
    25.0              False  False  False  False
    25.2              False  False  False  False
    25.4               True  False  False  False
    25.6              False  False  False  False
    25.8              False  False  False  False
    94.79             False  False   True   True
    94.99             False  False   True   True
    NaN                True   True   True   True