Search code examples
pythonpandasdataframegroup-by

Create multiple columns from a single column and group by pandas


work = pd.DataFrame({"JOB" : ['JOB01', 'JOB01', 'JOB02', 'JOB02', 'JOB03', 'JOB03'],
"STATUS" : ['ON_ORDER', 'ACTIVE','TO_BE_ALLOCATED', 'ON_ORDER', 'ACTIVE','TO_BE_ALLOCATED'],
"PART" : ['PART01', 'PART02','PART03','PART04','PART05','PART06']})

How can I use Pandas to groupby the JOB, split Status into columns based on the values and concatenate the Part field based on the JOB.

Desired Output:

JOB    | ON_ORDER  | ACTIVE   | TO_BE_ALLOCATED | PART_CON
JOB01  | True      | True     | False           | Part01\nPart02
JOB02  | True      | False    | True            | Part03\nPart04
JOB03  | False     | True     | True            | Part05\nPart06

Solution

  • Try:

    x = df.groupby("JOB")["PART"].agg(", ".join).rename("PART_CON")
    y = pd.crosstab(df["JOB"], df["STATUS"]).astype(bool)
    print(pd.concat([y, x], axis=1).reset_index())
    

    Prints:

         JOB  ACTIVE  ON_ORDER  TO_BE_ALLOCATED        PART_CON
    0  JOB01    True      True            False  PART01, PART02
    1  JOB02   False      True             True  PART03, PART04
    2  JOB03    True     False             True  PART05, PART06