Search code examples
pythonregexsortingglob

how to change internal sorting system of glob module in python


After applying a regex to the namefiles of a directory which start by 'chr[0-9XY]'*, I obtain a list in the following order:

['chr9', 'chr8', 'chr7', 'chr6', 'chr5', 'chr4', 'chr3', 'chr2', 'chr1', 'chr10', 'chr11', 'chr12', 'chr13', 'chr14', 'chr15', 'chr16', 'chr17', 'chr18', 'chr19', 'chr20', 'chr21', 'chr22', 'chrX', 'chrY']

I applied the glob.glob module to iterate through the desired files in the directory, and it sorts this way.

My question is if it's possible to make glob module sorting files in a different way, which is to sort by integers, and finally both X and Y. Like this:

['chr1', 'chr2', 'chr3', 'chr4', 'chr5', 'chr6', 'chr7', 'chr8', 'chr9', 'chr10', 'chr11', 'chr12', 'chr13', 'chr14', 'chr15', 'chr16', 'chr17', 'chr18', 'chr19', 'chr20', 'chr21', 'chr22', 'chrX', 'chrY']

Is there any way to accomplish that? Thanks in advance!


Solution

  • You can use a third party library called natsort, its called this because it naturally sorts the elements.

    You can install it via pip install natsort. You will need pip, and if you dont already have it installed, then look here (if you're using windows), otherwise, there are different ways to install pip if its not already installed on your system, simple do a simple search and you'll find a suitable guide.

    Once this is done, you can easily use natsort to do all the work for you:

    >>> import natsort
    >>> var = ['chr9', 'chr8', 'chr7', 'chr6', 'chr5', 'chr4', 'chr3', 'chr2', 'chr1', 'chr10', 'chr11', 'chr12', 'chr13', 'chr14', 'chr15', 'chr16', 'chr17', 'chr18', 'chr19', 'chr20', 'chr21', 'chr22', 'chrX', 'chrY']
    >>> natsort.natsorted(var)
    ['chr1', 'chr2', 'chr3', 'chr4', 'chr5', 'chr6', 'chr7', 'chr8', 'chr9', 'chr10', 'chr11', 'chr12', 'chr13', 'chr14', 'chr15', 'chr16', 'chr17', 'chr18', 'chr19', 'chr20', 'chr21', 'chr22', 'chrX', 'chrY']