Search code examples
pythonpandassortingcolumnsorting

Sorting a Data Frame by Month with Repeating Years, based on Unique 'Other' Column


In pandas, I am trying to sort rows of a large data frame by months. At the moment, the months are out of order. They are sorted alphabetically, but I would like to sort them chronologically. The tricky part is that I am sorting by a cycle of 21 months for every one product. There are two year columns, one for calendar year and one for fiscal year, and they differ on purpose. Fiscal Year 2021 is January 2021 - September 2021, and Fiscal Year 2022 is October 2021 - September 2022. There are hundreds of products, and the section below is just a sample of two products.

As seen in the table below, the months are out of order, but everything else is in the right order.

Again, ever product has 21 months, from January 2021 to September 2022. I want these to iterate in order for every product.

I am looking for a code to sort this data frame in the right way.

How it looks now (months not chronological by year):

Item Calendar Year Fiscal Year Month Amount
Product 1 2021 2021 April 45
Product 1 2021 2021 August 85
Product 1 2021 2021 February 25
Product 1 2021 2021 January 15
Product 1 2021 2021 July 75
Product 1 2021 2021 June 65
Product 1 2021 2021 March 35
Product 1 2021 2021 May 55
Product 1 2021 2021 September 95
Product 1 2021 2022 December 125
Product 1 2021 2022 November 115
Product 1 2021 2022 October 105
Product 1 2022 2022 April 405
Product 1 2022 2022 August 805
Product 1 2022 2022 February 205
Product 1 2022 2022 January 1005
Product 1 2022 2022 July 705
Product 1 2022 2022 June 605
Product 1 2022 2022 March 305
Product 1 2022 2022 May 505
Product 1 2022 2022 September 905
Product 2 2021 2021 April 4000
Product 2 2021 2021 August 8000
Product 2 2021 2021 February 2000
Product 2 2021 2021 January 1000
Product 2 2021 2021 July 7000
Product 2 2021 2021 June 6000
Product 2 2021 2021 March 3000
Product 2 2021 2021 May 5000
Product 2 2021 2021 September 9000
Product 2 2021 2022 December 12000
Product 2 2021 2022 November 11000
Product 2 2021 2022 October 10000
Product 2 2022 2022 April 40000
Product 2 2022 2022 August 80000
Product 2 2022 2022 February 20000
Product 2 2022 2022 January 10000
Product 2 2022 2022 July 70000
Product 2 2022 2022 June 60000
Product 2 2022 2022 March 30000
Product 2 2022 2022 May 50000
Product 2 2022 2022 September 90000

How it should look (months in order):

Item Calendar Year Fiscal Year Month Amount
Product 1 2021 2021 January 15
Product 1 2021 2021 February 25
Product 1 2021 2021 March 35
Product 1 2021 2021 April 45
Product 1 2021 2021 May 55
Product 1 2021 2021 June 65
Product 1 2021 2021 July 75
Product 1 2021 2021 August 85
Product 1 2021 2021 September 95
Product 1 2021 2022 October 105
Product 1 2021 2022 November 115
Product 1 2021 2022 December 125
Product 1 2022 2022 January 1005
Product 1 2022 2022 February 205
Product 1 2022 2022 March 305
Product 1 2022 2022 April 405
Product 1 2022 2022 May 505
Product 1 2022 2022 June 605
Product 1 2022 2022 July 705
Product 1 2022 2022 August 805
Product 1 2022 2022 September 905
Product 2 2021 2021 January 1000
Product 2 2021 2021 February 2000
Product 2 2021 2021 March 3000
Product 2 2021 2021 April 4000
Product 2 2021 2021 May 5000
Product 2 2021 2021 June 6000
Product 2 2021 2021 July 7000
Product 2 2021 2021 August 8000
Product 2 2021 2021 September 9000
Product 2 2021 2022 October 10000
Product 2 2021 2022 November 11000
Product 2 2021 2022 December 12000
Product 2 2022 2022 January 10000
Product 2 2022 2022 February 20000
Product 2 2022 2022 March 30000
Product 2 2022 2022 April 40000
Product 2 2022 2022 May 50000
Product 2 2022 2022 June 60000
Product 2 2022 2022 July 70000
Product 2 2022 2022 August 80000
Product 2 2022 2022 September 90000

Solution

  • First convert values to ordered categoricals, so possible sorting by multiple columns in DataFrame.sort_values:

    cat = ['January','February','March','April','May','June',
           'July','August','September','October','November','December']
    df['Month'] = pd.Categorical(df['Month'], ordered=True, categories=cat)
    df = df.sort_values(['Item','Calendar Year','Month'])
    

    Or create DatetimeIndex, so possible sorting by Item with datetimes:

    df.index = pd.to_datetime(df['Calendar Year'] + df['Month'], format='%Y%B')
    df = df.rename_axis('dt').sort_values(['Item','dt']).reset_index(drop=True)