Let's say I have a string that is a root directory that has been entered
'C:/Users/Me/'
Then I use os.listdir()
and join with it to create a list of subdirectories.
I end up with a list of strings that are like below:
'C:/Users/Me/Adir\Asubdir\'
and so on.
I want to split the subdirectories and capture each directory name as its own element. Below is one attempt. I am seemingly having issues with the \
and /
characters. I assume \
is escaping, so '[\\/]'
to me that says look for \
or /
so then '[\\/]([\w\s]+)[\\/]'
as a match pattern should look for any word between two slashes... but the output is only ['/Users/']
and nothing else is matched. So I then I add a escape for the forward slash.
'[\\\/]([\w\s]+)[\\\/]'
However, my output then only becomes ['Users','ADir']
so that is confusing the crud out of me.
My question is namely how do I tokenize each directory from a string using both \
and /
but maybe also why is my RE not working as I expect?
Minimal Example:
import re, os
info = re.compile('[\\\/]([\w ]+)[\\\/]')
root = 'C:/Users/i12500198/Documents/Projects/'
def getFiles(wdir=os.getcwd()):
files = (os.path.join(wdir,file) for file in os.listdir(wdir)
if os.path.isfile(os.path.join(wdir,file)))
return list(files)
def getDirs(wdir=os.getcwd()):
dirs = (os.path.join(wdir,adir) for adir in os.listdir(wdir)
if os.path.isdir(os.path.join(wdir,adir)))
return list(dirs)
def walkSubdirs(root,below=[]):
subdirs = getDirs(root)
for aDir in subdirs:
below.append(aDir)
walkSubdirs(aDir,below)
return below
subdirs = walkSubdirs(root)
for aDir in subdirs:
files = getFiles(aDir)
for f in files:
finfo = info.findall(f)
print(f)
print(finfo)
I want to split the subdirectories and capture each directory name as its own element
Instead of regular expressions, I suggest you use one of Python's standard functions for parsing filesystem paths.
Here is one using pathlib
:
from pathlib import Path
p = Path("C:/Users/Me/ADir\ASub Dir\2 x 2 Dir\\")
p.parts
#=> ('C:\\', 'Users', 'Me', 'ADir', 'ASub Dir\x02 x 2 Dir')
Note that the behaviour of pathlib.Path
depends on the system running Python. Since I'm on a Linux machine, I actually used pathlib.PureWindowsPath
here. I believe the output should be accurate for those of you on Windows.