Search code examples
pythonsortingglobnatural-sort

How to sort glob.glob containing numbers?


Doing

glob.glob('/var/log/apache2/other_vhosts_access.log*')

gives an unsorted list such as ['....76.gz', '....16.gz', '....46.gz', ...]. Also,

sorted(glob.glob('/var/log/apache2/other_vhosts_access.log*')) 

gives

other_vhosts_access.log
other_vhosts_access.log.1
other_vhosts_access.log.10.gz
other_vhosts_access.log.11.gz
other_vhosts_access.log.12.gz
...
other_vhosts_access.log.19.gz
other_vhosts_access.log.2.gz

How to have a better sort? .log, .log.1, .log.2.gz, ..., .log.9.gz, .log.10.gz, ...


Solution

  • To expand on my comment, perhaps something like this will do. This pulls the first sequence of digits found between decimal points or at the end of the file and uses the value as the primary sort key, and the full file name for secondary.

    file_list = """
    other_vhosts_access.log
    other_vhosts_access.log.1
    other_vhosts_access.log.10.gz
    other_vhosts_access.log.11.gz
    other_vhosts_access.log.12.gz
    other_vhosts_access.log.19.gz
    other_vhosts_access.log.2.gz
    """.strip().split()
    
    import re
    
    re_num = r"\.(\d+)(\.|$)"
    
    def sort_key(file_name):
        match=re.search(re_num,file_name)
        if match is None:
            return(0,file_name)
        else:
            return(int(match.group(1)),file_name)
        
    print(*sorted(file_list,key=sort_key),sep='\n')