Search code examples
pythonpandasdataframepandas-groupbyrows

Grouping rows from Pandas Dataframe with column value within 20% of first value in group


Say I have a Pandas Dataframe below:

ID value
0 1
1 2
2 1.2
3 1.6
4 1.5
5 1.4
6 1.1

I would like to group the Values within 20% margin of the first element of the group.

ID value group
0 1 0
2 1.2 0
6 1.1 0
1 2 1
3 1.6 1
4 1.5 2
5 1.4 2

What is the best way to do this?


Solution

  • Try:

    df["group"] = np.nan
    group = 0
    while (m := df["group"].isna()).any():
        val = df.loc[m.idxmax(), "value"]
        df.loc[m & (np.abs(df["value"] - val) <= val * 0.2), "group"] = group
        group += 1
    
    print(df.sort_values(by="group"))
    

    Prints:

       ID  value  group
    0   0    1.0    0.0
    2   2    1.2    0.0
    6   6    1.1    0.0
    1   1    2.0    1.0
    3   3    1.6    1.0
    4   4    1.5    2.0
    5   5    1.4    2.0