Search code examples
pythondirectorypath

Get files path from text tree in python


I have the following text tree, and I want to get all the possible paths.

 subdirectory 1
   file11
   file12
   sub-sub-directory 1
     file111
     file112
 subdirectory 2
   file 21
   sub-sub-directory 21
   sub-sub-directory 22    
     sub-sub-sub-directory 221
       file 2211

I expect to receive a console output with all correct possible paths, ex:

Ex1: Subdirectory 1/Sub-sub-directory 11/file111

Ex2: Subdirectory 1/file12

Ex3: Subdirectory 2/sub-sub-directory 22/sub-sub-sub-directory 221/file 2211

for any combination. Any idea ? Thanks.


Solution

  • When I run the following for your example file (named file.txt here)

    import re
    from itertools import groupby
    
    re_indent = re.compile(r"^\s*")
    def key_func(line):
        return len(re_indent.match(line)[0])
    
    with open("file.txt", "r") as file:
        for key, parts in groupby(file, key=key_func):
            print(f"{key}:", [part.strip() for part in parts])
    

    I get essentially (apart from formatting):

      indent  parts
    --------  -----------------------------------------------------------
           1  ['subdirectory 1']
           3  ['file11', 'file12', 'sub-sub-directory 1']
           5  ['file111', 'file112']
           1  ['subdirectory 2']
           3  ['file 21', 'sub-sub-directory 21', 'sub-sub-directory 22']
           5  ['sub-sub-sub-directory 221']
           7  ['file 2211']
    

    So, as long as the file sticks to this indentation rule, you could try the following:

    import re
    from itertools import groupby
    
    re_indent = re.compile(r"^\s*")
    def key_func(line):
        return len(re_indent.match(line)[0])
    
    last_key = -1
    paths = []
    current, current_str = [], ""
    with open("file.txt", "r") as file:
        for key, parts in groupby(file, key=key_func):
            if key < last_key:
                current = current[:(key - 1) // 2]
                current_str = "/".join(current)
            for part in parts:
                paths.append(f"{current_str}/{part.strip()}".lstrip("/"))
            current.append(part.strip())
            current_str = paths[-1]
            last_key = key
    

    to get

    paths = [
     'subdirectory 1',
     'subdirectory 1/file11',
     'subdirectory 1/file12',
     'subdirectory 1/sub-sub-directory 1',
     'subdirectory 1/sub-sub-directory 1/file111',
     'subdirectory 1/sub-sub-directory 1/file112',
     'subdirectory 2',
     'subdirectory 2/file 21',
     'subdirectory 2/sub-sub-directory 21',
     'subdirectory 2/sub-sub-directory 22',
     'subdirectory 2/sub-sub-directory 22/sub-sub-sub-directory 221',
     'subdirectory 2/sub-sub-directory 22/sub-sub-sub-directory 221/file 2211'
    ]
    

    which seems to be what you are looking for?