Search code examples
pandaspandas-groupby

pandas groupby per-group value


I have this data:

df = pd.DataFrame({
    "dim1":   [ "aaa", "aaa", "aaa", "aaa", "aaa", "aaa" ],
    "dim2":   [ "xxx", "xxx", "xxx", "yyy", "yyy", "yyy" ],
    "iter":   [     0,     1,     2,     0,     1,     2 ],
    "value1": [   100,   101,    99,   500,   490,   510 ],
    "value2": [ 10000, 10100,  9900, 50000, 49000, 51000 ],
})

I then groupby dim1/dim2 and out of all iterations, I pick value1/value2 for the minimum value1:

df = df.groupby(["dim1", "dim2"], group_keys=False) \
    .apply(lambda x: x.sort_values("value1").head(1)).drop(columns=["iter"])

which returns:

dim1    dim2    value1  value2
 aaa    xxx         99    9900
 aaa    yyy        490   49000

My question: how can I add a new column that contains the min value1 per dim1 group:

dim1    dim2    value1  value2     new_col
 aaa    xxx         99    9900          99
 aaa    yyy        490   49000          99

I tried something like this, which didn't work:

df["new_col"] = df.groupby(["dim1"], group_keys=False) \
    .apply(lambda x: x.value1.head(1))

Solution

  • IIUC, you can use .groupby + .transform afterwards:

    df["new_col"] = df.groupby("dim1")["value1"].transform("min")
    print(df)
    

    Prints:

      dim1 dim2  value1  value2  new_col
    2  aaa  xxx      99    9900       99
    4  aaa  yyy     490   49000       99