Say I have a Pandas Dataframe below:
ID | value |
---|---|
0 | 1 |
1 | 2 |
2 | 1.2 |
3 | 1.6 |
4 | 1.5 |
5 | 1.4 |
6 | 1.1 |
I would like to group the Values within 20% margin of the first element of the group.
ID | value | group |
---|---|---|
0 | 1 | 0 |
2 | 1.2 | 0 |
6 | 1.1 | 0 |
1 | 2 | 1 |
3 | 1.6 | 1 |
4 | 1.5 | 2 |
5 | 1.4 | 2 |
What is the best way to do this?
Try:
df["group"] = np.nan
group = 0
while (m := df["group"].isna()).any():
val = df.loc[m.idxmax(), "value"]
df.loc[m & (np.abs(df["value"] - val) <= val * 0.2), "group"] = group
group += 1
print(df.sort_values(by="group"))
Prints:
ID value group
0 0 1.0 0.0
2 2 1.2 0.0
6 6 1.1 0.0
1 1 2.0 1.0
3 3 1.6 1.0
4 4 1.5 2.0
5 5 1.4 2.0