Search code examples
pythonsortingnumbersdecimalalphanumeric

python sorting negative and/or decimal alphanumeric strings


I'm having problems sorting a list of strings that contain negative and/or decimal alphanumeric strings. This is what I have so far:

import re

format_ids = ["synopsys_SS_2v_-40c_SS.lib",
              "synopsys_SS_1v_-40c_SS.lib",
              "synopsys_SS_1.2v_-40c_SS.lib", 
              "synopsys_SS_1.4v_-40c_SS.lib",
              "synopsys_SS_2v_-40c_TT.lib",
              "synopsys_FF_3v_25c_FF.lib",
              "synopsys_TT_4v_125c_TT.lib",
              "synopsys_TT_1v_85c_TT.lib",
              "synopsys_TT_10v_85c_TT.lib",
              "synopsys_FF_3v_-40c_SS.lib",
              "synopsys_FF_3v_-40c_TT.lib"]

selector = r'.*(FF|TT|SS)_([-\.\d]+v)_([-\.\d]+c)_(FF|TT|SS).*'
#key = [2,1,3]
key = 2
produce_groups = False

if isinstance(key, int):
    key = [key]

convert = lambda text: float(text) if text.isdigit() else text
alphanum_key = lambda k: [convert(c) for c in re.split('([-.\d]+)', k)]
split_list = lambda name: tuple(alphanum_key(re.findall(selector,name)[0][i]) for i in key)
format_ids.sort(key=split_list)

print "\n".join(format_ids)

I'm expecting the following output (sorting by the 3rd key):

synopsys_SS_2v_-40c_SS.lib
synopsys_SS_1v_-40c_SS.lib
synopsys_SS_1.2v_-40c_SS.lib
synopsys_SS_1.4v_-40c_SS.lib
synopsys_SS_2v_-40c_TT.lib
synopsys_FF_3v_-40c_SS.lib
synopsys_FF_3v_-40c_TT.lib
synopsys_FF_3v_25c_FF.lib
synopsys_TT_1v_85c_TT.lib
synopsys_TT_10v_85c_TT.lib
synopsys_TT_4v_125c_TT.lib

But I'm getting the following (all the negative numbers are listed last):

synopsys_FF_3v_25c_FF.lib
synopsys_TT_1v_85c_TT.lib
synopsys_TT_10v_85c_TT.lib
synopsys_TT_4v_125c_TT.lib
synopsys_SS_2v_-40c_SS.lib
synopsys_SS_1v_-40c_SS.lib
synopsys_SS_1.2v_-40c_SS.lib
synopsys_SS_1.4v_-40c_SS.lib
synopsys_SS_2v_-40c_TT.lib
synopsys_FF_3v_-40c_SS.lib
synopsys_FF_3v_-40c_TT.lib

Now, for the decimals from the 2nd key (changing key variable to 1 (key=1)), I get:

synopsys_SS_1v_-40c_SS.lib
synopsys_TT_1v_85c_TT.lib
synopsys_SS_2v_-40c_SS.lib
synopsys_SS_2v_-40c_TT.lib
synopsys_FF_3v_25c_FF.lib
synopsys_FF_3v_-40c_SS.lib
synopsys_FF_3v_-40c_TT.lib
synopsys_TT_4v_125c_TT.lib
synopsys_TT_10v_85c_TT.lib
synopsys_SS_1.2v_-40c_SS.lib
synopsys_SS_1.4v_-40c_SS.lib

Expecting:

synopsys_SS_1v_-40c_SS.lib
synopsys_TT_1v_85c_TT.lib
synopsys_SS_1.2v_-40c_SS.lib
synopsys_SS_1.4v_-40c_SS.lib
synopsys_SS_2v_-40c_SS.lib
synopsys_SS_2v_-40c_TT.lib
synopsys_FF_3v_25c_FF.lib
synopsys_FF_3v_-40c_SS.lib
synopsys_FF_3v_-40c_TT.lib
synopsys_TT_4v_125c_TT.lib
synopsys_TT_10v_85c_TT.lib

Any suggestions are greatly appreciated.

Edit: I ended up using the simpler method described by @StephenRauch:

import re
def sort_names(format_ids, selector, key=1):

    if isinstance(key, int):
        key = [key]

    SELECTOR_RE = re.compile(selector)

    def convert(x):
        try:
            return float(x[:-1])
        except ValueError:
            return x

    def sort_keys(key):
        def split_fid(x):
            x = SELECTOR_RE.split(x)
            return tuple([convert(x[i]) for i in key])
        return split_fid

    format_ids.sort(key=sort_keys(key))

format_ids = ["synopsys_SS_2v_-40c_SS.lib",
              "synopsys_SS_1v_-40c_SS.lib",
              "synopsys_SS_1.2v_-40c_SS.lib",
              "synopsys_SS_1.4v_-40c_SS.lib",
              "synopsys_SS_2v_-40c_TT.lib",
              "synopsys_FF_3v_25c_FF.lib",
              "synopsys_TT_4v_125c_TT.lib",
              "synopsys_TT_1v_85c_TT.lib",
              "synopsys_TT_10v_85c_TT.lib",
              "synopsys_FF_3v_-40c_SS.lib",
              "synopsys_FF_3v_-40c_TT.lib"]

selector = r'.*(FF|TT|SS)_([-\.\d]+v)_([-\.\d]+c)_(FF|TT|SS).*'
key = [2,1,3]

sort_names(format_ids,selector,key)

Solution

  • Need to test for numbers a bit differently, and the re.split() is given a leading '' which was throwing off the convert routine.

    Fixed Code:

    key = [2,1,3]
    
    def convert(x):
        try:
            return float(x)
        except ValueError:
            return x
    
    alphanum_keys = lambda k: (convert(c) for c in re.split('([-.\d]+)', k))
    alphanum_key = lambda k: [i for i in alphanum_keys(k) if i != ''][0]
    split_list = lambda name: [
        alphanum_key(re.findall(selector, name)[0][i]) for i in key]
    format_ids.sort(key=split_list)
    

    Alternate (simpler) solution:

    But... All of those lambdas and regexs, are way more complicated than you need for this problem. How about just:

    def sort_key(keys):
    
        def convert(x):
            try:
                return float(x[:-1])
            except ValueError:
                return x
    
        def f(x):
            x = x.split('_')
            return tuple([convert(x[i]) for i in keys])
        return f
    
    format_ids.sort(key=sort_key([3, 2, 4]))
    

    How?

    sort_keys() returns a function f(). This is a function of one parameter that is passed to sort() to evaluate sort order. The function f() will use the values of keys that are passed to sort_keys() because these are the values available at the time f() is defined. This is called a closure.

    Results:

    synopsys_SS_1v_-40c_SS.lib
    synopsys_SS_1.2v_-40c_SS.lib
    synopsys_SS_1.4v_-40c_SS.lib
    synopsys_SS_2v_-40c_SS.lib
    synopsys_SS_2v_-40c_TT.lib
    synopsys_FF_3v_-40c_SS.lib
    synopsys_FF_3v_-40c_TT.lib
    synopsys_FF_3v_25c_FF.lib
    synopsys_TT_1v_85c_TT.lib
    synopsys_TT_10v_85c_TT.lib
    synopsys_TT_4v_125c_TT.lib