Search code examples
pythonsortingpandasnumpyfinance

Sort tenors in finance notation


I have an array of tenors

Tenors = np.array(['10Y', '15Y', '1M', '1Y', '20Y', '2Y', '30Y', '3M', '5Y', '6M', '9M'])

where M stands for month and Y stands for years. The correctly sorted order (ascending) would then be

['1M', '3M', '6M', '9M', '1Y', '2Y', '5Y', '10Y', '15Y', '20Y', '30Y']

How do I achieve that using python with scipy/numpy? As the tenors originate from a pandas dataframe a solution based on pandas would be fine as well.


Solution

  • Approach #1 Here's a NumPy based approach using np.core.defchararray.replace -

    repl = np.core.defchararray.replace
    out = Tenors[repl(repl(Tenors,'M','00'),'Y','0000').astype(int).argsort()]
    

    Approach #2 If you are working with strings like '18M', we need to do a bit more of work, like so -

    def generic_case_vectorized(Tenors):
        # Get shorter names for functions
        repl = np.core.defchararray.replace
        isalph = np.core.defchararray.isalpha
    
        # Get scaling values
        TS1 = Tenors.view('S1')
        scale = repl(repl(TS1[isalph(TS1)],'Y','12'),'M','1').astype(int)
    
        # Get the numeric values
        vals = repl(repl(Tenors,'M',''),'Y','').astype(int)
    
        # Finally scale numeric values and use sorted indices for sorting input arr
        return Tenors[(scale*vals).argsort()]
    

    Approach #3 Here's another approach, though a loopy one to again handle generic cases -

    def generic_case_loopy(Tenors):
        arr = np.array([[i[:-1],i[-1]] for i in Tenors])
        return Tenors[(arr[:,0].astype(int)*((arr[:,1]=='Y')*11+1)).argsort()]
    

    Sample run -

    In [84]: Tenors
    Out[84]: 
    array(['10Y', '15Y', '1M', '1Y', '20Y', '2Y', '30Y', '3M', '25M', '5Y',
           '6M', '18M'], 
          dtype='|S3')
    
    In [85]: generic_case_vectorized(Tenors)
    Out[85]: 
    array(['1M', '3M', '6M', '1Y', '18M', '2Y', '25M', '5Y', '10Y', '15Y',
           '20Y', '30Y'], 
          dtype='|S3')
    
    In [86]: generic_case_loopy(Tenors)
    Out[86]: 
    array(['1M', '3M', '6M', '1Y', '18M', '2Y', '25M', '5Y', '10Y', '15Y',
           '20Y', '30Y'], 
          dtype='|S3')