Search code examples
pythonpython-3.xpandasdataframepandas-groupby

python pandas how to organize similar group data


I want to organize similar group data. Here is my data frame

  SKU
FATUT
GUYGE
FATUT-01
SUPAU
GUYPE
SUPAU-01 
FATUT-02
GUYGE-01 

my expected dataframe will be look like this:

     SKU
   FATUT
FATUT-01
FATUT-02
   GUYGE
GUYGE-01
   SUPAU
SUPAU-01
   GUYPE

I want to organize similar group of data sequentially.


Solution

  • One option is to use groupby with the parameter sort=False; then concatenate the split DataFrames.

    How it works:

    • Group df by the strings before the dash
    • groupby sorts by the groupby keys by default; when we specify sort=False, we make sure that the keys are stored in the same order as they first appear in df, i.e. "GUYPE" stays behind "SUPAU".
    • groupby object contains information about the groups that you can unpack like a dictionary. Then unpack it and build a generator expression that returns the grouped DataFrames.
    • Using concat, concatenate the split DataFrames into one; by using ignore_index=True, we ignore index coming from the split DataFrames and reset the index.
    out = pd.concat((d for _, d in df.groupby(df['SKU'].str.split('-').str[0], sort=False)), ignore_index=True)
    

    Output:

            SKU
    0     FATUT
    1  FATUT-01
    2  FATUT-02
    3     GUYGE
    4  GUYGE-01
    5     SUPAU
    6  SUPAU-01
    7     GUYPE
    

    But I feel like, for your task, sort_values might work as well, even if the orders are not exactly the same as in the desired output:

    df = df.sort_values(by='SKU', ignore_index=True)
    

    Output:

            SKU
    0     FATUT
    1  FATUT-01
    2  FATUT-02
    3     GUYGE
    4  GUYGE-01
    5     GUYPE
    6     SUPAU
    7  SUPAU-01