Search code examples
pythonpandasdataframecategorical-data

Use a categorical column to order the dataframe according to an array


I have an array like this:

['A 100', 'A 200', 'A 300', 'A 400', 'A 500', 'B 100', 'B 200', 'B 300', 'B 400']

I also have a dataframe like this:

BIN      CA      SUM
100       B      B 100
300       A      A 300
300       B      B 300
400       B      B 400
400       A      A 400
200       B      B 200
100       A      A 100
200       A      A 200

I want to use pd.Categorical to order the column dataframe according to the array.

The expected output is:

BIN      CA      SUM
100       A      A 100
200       A      A 200
300       A      A 300
400       A      A 400
100       B      B 100
200       B      B 200
300       B      B 300
400       B      B 400

Solution

  • You can use pd.Categorical to convert the SUM column to categorical column having order, then sort the values:

    df['SUM'] = pd.Categorical(df['SUM'], categories=arr, ordered=True)
    df.sort_values('SUM')
    

    Alternatively you can create a dictionary that maps the items in arr to their sorting order then .map this dictionary on SUM column and use np.argsort to get the indices that would sort the dataframe:

    dct = {v: i for i, v in enumerate(arr)}
    df.iloc[np.argsort(df['SUM'].map(dct))]
    

       BIN CA    SUM
    6  100  A  A 100
    7  200  A  A 200
    1  300  A  A 300
    4  400  A  A 400
    0  100  B  B 100
    5  200  B  B 200
    2  300  B  B 300
    3  400  B  B 400