For example, if i have a dataset in the first table below....
Name | Code | Thickness |
---|---|---|
CH1 | 3 | 0.5 |
CH1 | 3 | 0.3 |
CH1 | 4 | 0.4 |
CH1 | 3 | 0.2 |
CH1 | 5 | 0.6 |
CH1 | 5 | 0.4 |
.... and i want to achieve the result in the next table by grouping by the "Code" column and summing the "Thickness" column
Name | Code | Thickness | Grp_Thinckness |
---|---|---|---|
CH1 | 3 | 0.5 | 0.8 |
CH1 | 3 | 0.3 | 0.8 |
CH1 | 4 | 0.4 | 0.4 |
CH1 | 3 | 0.2 | 0.2 |
CH1 | 5 | 0.6 | 1.0 |
CH1 | 5 | 0.4 | 1.0 |
How do I go about this?
It's a gap-and-island problem. Anytime the Code
changes, it creates a new island. You can solve these problems with a cumsum
:
s = (df["Code"] != df["Code"].shift()).cumsum()
df["Grp_Thickness"] = df.groupby(s)["Thickness"].transform("sum")