Search code examples
pythoncross-platformglobcase-sensitivepathlib

Make pathlib.glob() and pathlib.rglob() case-insensitive for platform-agnostic application


I am using pathlib.glob() and pathlib.rglob() to matching files from a directory and its subdirectories, respectively. Target files both are both lower case .txt and upper case .TXT files. According file paths were read from the filesystem as follows:

import pathlib

directory = pathlib.Path()
files_to_create = ['a.txt', 'b.TXT']
suffixes_to_test = ['*.txt', '*.TXT']

for filename in files_to_create:
    filepath = directory / filename
    filepath.touch()
    
for suffix in suffixes_to_test:
    files = [fp.relative_to(directory) for fp in directory.glob(suffix)]
    print(f'{suffix}: {files}')

The majority of the code base was developed on a Windows 10 machine (running Python 3.7.4) and was now moved to macOS Monterey 12.0.1 (running Python 3.10.1).

On Windows both files a.txt and b.TXT are matching the patterns:

*.txt: [WindowsPath('a.txt'), WindowsPath('b.TXT')]
*.TXT: [WindowsPath('a.txt'), WindowsPath('b.TXT')]

In contrast, macOS only one file matches each pattern:

*.txt: [PosixPath('a.txt')]
*.TXT: [PosixPath('b.TXT')]

Therefore, I assume that the macOS file system might be case-sensitive, whereas the Windows one is not. According to Apple's User Guide the macOS file system used should not be case-sensitive by default but can be configured as such. Something similar might apply for Linux or Unix file systems as discussed here and here.

Despite the reason for this differing behavior, I need to find a platform-agnostic way to get both capital TXT and lower case txt files. A rather naive workaround could be something like this:

results = set([fp.relative_to(directory) for suffix in suffixes_to_test for fp in directory.glob(suffix)])

Which gives the desired output on both macOS and Windows:

{PosixPath('b.TXT'), PosixPath('a.txt')}

However, is there a more elegant way? I could not find any option like ignore_case in pathlib's documentation.


Solution

  • What about something like:

    suffix = '*.[tT][xX][tT]'
    files = [fp.relative_to(directory) for fp in directory.glob(suffix)]
    

    It is not so generalizable for a "case-insensitive glob", but it works well for limited and specific use-case like your glob of a specific extension.