Search code examples
pythonglobpathlibnatsort

How to sort 'WindowsPath' object files naturally


I am iterating through files in a directory using Path().glob() and it's not iterating in the correct natural ordering. For eg. it's iterating like this:

[WindowsPath('C:/Users/HP/Desktop/P1/dataP1/SAMPLED_NORMALIZED/P1_Cor.csv'),
 WindowsPath('C:/Users/HP/Desktop/P10/dataP10/SAMPLED_NORMALIZED/P10_Cor.csv'),
 WindowsPath('C:/Users/HP/Desktop/P11/dataP11/SAMPLED_NORMALIZED/P11_Cor.csv'),
 WindowsPath('C:/Users/HP/Desktop/P12/dataP12/SAMPLED_NORMALIZED/P12_Cor.csv'),
# ...and so on from P1 to P30

When I want it to iterate like this: P1, P2, P3 and so on.

I have tried using the code below but it gives me an error:

from pathlib import Path

file_path = r'C:/Users/HP/Desktop'

files = Path(file_path).glob(file)
sorted(files, key=lambda name: int(name[10:]))

where 10 is just some trivial number as I am trying out the code.

The error:

TypeError: 'WindowsPath' object is not subscriptable

Ultimately, what I want is to iterate through the files and do something with each file:

from pathlib import Path

for i, fl in enumerate(Path(file_path).glob(file)):
    # do something

I have even tried the library natsort but it's not ordering the files correctly in the iteration. I have tried:

from natsort import natsort_keygen, ns
natsort_key1 = natsort_keygen(key=lambda y: y.lower())
from natsort import natsort_keygen, ns
natsort_key2 = natsort_keygen(alg=ns.IGNORECASE)

The two codes above still gives me P1, P10, P11 and so on.

Any help would really be appreciated.


Solution

  • If you want to sort by the digits in the file name, you can use the Path.name attribute and a regular expression that extracts the digits.

    from pathlib import Path
    import re
    
    file_path = r'C:/Users/HP/Desktop/P1/dataP1/SAMPLED_NORMALIZED/'
    
    def _p_file_sort_key(file_path):
        """Given a file in the form P(digits)_cor.csv, return digits as an int"""
        return int(re.match(r"P(\d+)", file_path.name).group(1))
    
    files = sorted(Path(file_path).glob("P*_Cor.csv"), key=_p_file_sort_key)