Search code examples
pythonpython-2.7sorting

Sort a list that contains path in python


How can I sort a path that it contains integer as well as strings? My file names are :

tmp_1483228800-1485907200_0, 
tmp_1483228800-1485907200_1,
tmp_1483228800-1485907200_2,
.... 

I need to sort them according to the integers after the last underline. That’s how my code looks like:

act = "." + "/*/raw_results.csv"
files = glob.glob(act)
sorted_list = sorted(files, key = lambda x:int(os.path.splitext(os.path.dirname(x))[0]))

I know the problem is there are lot of integers and some strings in between so it can not convert everything to integer,but I do not know how to solve it. Thanks in advance.


Solution

  • According to your comments, your files will be in this format:

    >>> files = [".../data/tmp_1483228801-1485907200_10/raw_results.csv",
                 ".../data/tmp_1483228800-1485907200_1/raw_results.csv",
                 ".../data/tmp_1483228801-1485907201_30/raw_results.csv",
                 ".../data/tmp_1483228801-1485907200_2/raw_results.csv",
                 ".../data/tmp_1483228801-1485907201_9/raw_results.csv"]
    

    You can then just extract all the numbers in those full, raw file paths, and convert those to int. No need to split the path up into directory path segments.

    >>> [[int(n) for n in re.findall(r"\d+", f)] for f in files]
    [[1483228801, 1485907200, 10],
     [1483228800, 1485907200, 1],
     [1483228801, 1485907201, 30],
     [1483228801, 1485907200, 2],
     [1483228801, 1485907201, 9]]
    

    This will extract all the numbers in the path and sort by them, giving the highest priority to the first number it finds. If those other numbers are all the same, that's not a problem, and if those are different, it will sort by those, first.

    >>> sorted(files, key=lambda f: [int(n) for n in re.findall(r"\d+", f)])
    ['.../data/tmp_1483228800-1485907200_1/raw_results.csv',
     '.../data/tmp_1483228801-1485907200_2/raw_results.csv',
     '.../data/tmp_1483228801-1485907200_10/raw_results.csv',
     '.../data/tmp_1483228801-1485907201_9/raw_results.csv',
     '.../data/tmp_1483228801-1485907201_30/raw_results.csv']
    

    If that's not what you want, you can use the (slightly wasteful) key=lambda f: [int(n) for n in re.findall(r"\d+", f)][-1] to only sort by the last number.