I have the following text tree, and I want to get all the possible paths.
subdirectory 1
file11
file12
sub-sub-directory 1
file111
file112
subdirectory 2
file 21
sub-sub-directory 21
sub-sub-directory 22
sub-sub-sub-directory 221
file 2211
I expect to receive a console output with all correct possible paths, ex:
Ex1: Subdirectory 1/Sub-sub-directory 11/file111
Ex2: Subdirectory 1/file12
Ex3: Subdirectory 2/sub-sub-directory 22/sub-sub-sub-directory 221/file 2211
for any combination. Any idea ? Thanks.
When I run the following for your example file (named file.txt
here)
import re
from itertools import groupby
re_indent = re.compile(r"^\s*")
def key_func(line):
return len(re_indent.match(line)[0])
with open("file.txt", "r") as file:
for key, parts in groupby(file, key=key_func):
print(f"{key}:", [part.strip() for part in parts])
I get essentially (apart from formatting):
indent parts
-------- -----------------------------------------------------------
1 ['subdirectory 1']
3 ['file11', 'file12', 'sub-sub-directory 1']
5 ['file111', 'file112']
1 ['subdirectory 2']
3 ['file 21', 'sub-sub-directory 21', 'sub-sub-directory 22']
5 ['sub-sub-sub-directory 221']
7 ['file 2211']
So, as long as the file sticks to this indentation rule, you could try the following:
import re
from itertools import groupby
re_indent = re.compile(r"^\s*")
def key_func(line):
return len(re_indent.match(line)[0])
last_key = -1
paths = []
current, current_str = [], ""
with open("file.txt", "r") as file:
for key, parts in groupby(file, key=key_func):
if key < last_key:
current = current[:(key - 1) // 2]
current_str = "/".join(current)
for part in parts:
paths.append(f"{current_str}/{part.strip()}".lstrip("/"))
current.append(part.strip())
current_str = paths[-1]
last_key = key
to get
paths = [
'subdirectory 1',
'subdirectory 1/file11',
'subdirectory 1/file12',
'subdirectory 1/sub-sub-directory 1',
'subdirectory 1/sub-sub-directory 1/file111',
'subdirectory 1/sub-sub-directory 1/file112',
'subdirectory 2',
'subdirectory 2/file 21',
'subdirectory 2/sub-sub-directory 21',
'subdirectory 2/sub-sub-directory 22',
'subdirectory 2/sub-sub-directory 22/sub-sub-sub-directory 221',
'subdirectory 2/sub-sub-directory 22/sub-sub-sub-directory 221/file 2211'
]
which seems to be what you are looking for?