Search code examples
pythonpandasminimum

Finding first minimum values python


How to find the first of several minimum values in a dataset? I want to eventually find values that are at least 2 greater than the minimum value, sequentially.

For example,

import pandas as pd
import numpy as np
df = pd.DataFrame({'ID': [1,1,1,1,1,1,1], 'value': [0.6, 1.5, 1.6, 1.2, 2.8, 0.3, 0.2]})

I would like to identify df['value'][0], or simply (0.6), as the first minimum in this array. Then identify df['value'][4], or (2.8), as the value at least 2 greater than the first identified minimum (0.6).

df = pd.DataFrame({'ID': [1,1,1,1,1,1,1], 'value': [0.6, 1.5, 1.6, 1.2, 2.8, 0.3, 0.2]})
df['loc_min'] = df.value[(df.value.shift(1) >= df.value) & (df.value.shift(-1) >= df.value)]
df['loc_min']= df.groupby(['ID'], sort=False)['loc_min'].apply(lambda x: x.ffill()) 
df['condition'] = (df['value'] >= df['loc_min'] + 2)

This works for other datasets but not when the minimums are first.

The ideal output would be:

    ID  value loc_min condition
0   1   0.6   nan     False
1   1   1.5   0.6     False
2   1   1.6   0.6     False
3   1   1.2   0.6     False
4   1   2.8   0.6     True
5   1   0.3   0.3     False
6   1   0.2   0.2     False

As suggested in a comment, a loop would be a better way to go about this.


Solution

  • Seems like you need cummin and a simple loc

    df['cummin_'] = df.groupby('ID').value.cummin()
    df['condition'] = df.value >= df.cummin_ + 2
    
    
        ID  value   cummin_ condition
    0   1   0.6     0.6     False
    1   1   1.5     0.6     False
    2   1   1.6     0.6     False
    3   1   1.2     0.6     False
    4   1   2.8     0.6     True
    5   1   0.3     0.3     False
    6   1   0.2     0.2     False
    

    Another option is to use expanding. Take, for example,

    df = pd.DataFrame({'ID': [1,1,1,1,1,1,1,2,2], 'value': [0.6, 1.5, 1.6, 1.2, 2.8, 0.3, 0.2,0.4,2.9]})
    

    Then

    df.groupby('ID').value.expanding(2).min()
    
        ID   
    1   0    NaN
        1    0.6
        2    0.6
        3    0.6
        4    0.6
        5    0.3
        6    0.2
    2   7    NaN
        8    0.4
    

    The expanding function yields your NaNs at first while cummin accounts for the first value. Just a matter of understanding how you want results to be interpreted.