Search code examples
pythonpython-re

Python RE Directories and slashes


Let's say I have a string that is a root directory that has been entered

'C:/Users/Me/'

Then I use os.listdir() and join with it to create a list of subdirectories.

I end up with a list of strings that are like below:

'C:/Users/Me/Adir\Asubdir\'

and so on.

I want to split the subdirectories and capture each directory name as its own element. Below is one attempt. I am seemingly having issues with the \ and / characters. I assume \ is escaping, so '[\\/]' to me that says look for \ or / so then '[\\/]([\w\s]+)[\\/]' as a match pattern should look for any word between two slashes... but the output is only ['/Users/'] and nothing else is matched. So I then I add a escape for the forward slash.

'[\\\/]([\w\s]+)[\\\/]'

However, my output then only becomes ['Users','ADir'] so that is confusing the crud out of me.

My question is namely how do I tokenize each directory from a string using both \ and / but maybe also why is my RE not working as I expect?

Minimal Example:

import re, os

info = re.compile('[\\\/]([\w ]+)[\\\/]')


root = 'C:/Users/i12500198/Documents/Projects/'

def getFiles(wdir=os.getcwd()):
    files = (os.path.join(wdir,file) for file in os.listdir(wdir)
                 if os.path.isfile(os.path.join(wdir,file)))
    return list(files)

def getDirs(wdir=os.getcwd()):
    dirs = (os.path.join(wdir,adir) for adir in os.listdir(wdir)
                if os.path.isdir(os.path.join(wdir,adir)))
    return list(dirs)

def walkSubdirs(root,below=[]):
    subdirs = getDirs(root)
    for aDir in subdirs:
        below.append(aDir)
        walkSubdirs(aDir,below)       
        
    return below   

subdirs = walkSubdirs(root)
    
for aDir in subdirs:
    files = getFiles(aDir)
    for f in files:
        finfo = info.findall(f)
        print(f)
        print(finfo)

Solution

  • I want to split the subdirectories and capture each directory name as its own element

    Instead of regular expressions, I suggest you use one of Python's standard functions for parsing filesystem paths.

    Here is one using pathlib:

    from pathlib import Path
    
    p = Path("C:/Users/Me/ADir\ASub Dir\2 x 2 Dir\\")
    p.parts
    #=> ('C:\\', 'Users', 'Me', 'ADir', 'ASub Dir\x02 x 2 Dir')
    

    Note that the behaviour of pathlib.Path depends on the system running Python. Since I'm on a Linux machine, I actually used pathlib.PureWindowsPath here. I believe the output should be accurate for those of you on Windows.