Search code examples
pythonpython-3.xstringlistrecursion

Convert tree-like structure formatted as a string using indentations to a list of paths


I have a string like this:

Given_String = """\
Shoes
Fruits
 Red
  Apple
  Cherry
 !
 Yellow
  Banana
  Grapes
   Small
   Big
  !
 !
!
"""

I want to convert it into a list of path strings:

["Shoes",
"Fruits/Red/Apple",
"Fruits/Red/Cherry",
"Fruits/Yellow/Banana",
"Fruits/Yellow/Grapes/Small",
"Fruits/Yellow/Grapes/Big"]

I tried removing the '!' and replacing front spaces with "/", but I couldn't find any way.


Solution

  • Removing the "!" is a good idea: they only provide redundant information. But the spaces cannot just be replaced with slashes. Instead keep track of the current path in the form of a list, but always truncate that path so that the number of items in that path does not exceed the number of spaces you have in front of the current text.

    Here is a generator function that does that. Note that it relies on the fact that your indentation happens with one space at a time:

    def get_paths(lines):
        path = []
        for line in lines.splitlines():
            content = line.lstrip()
            if content.rstrip() not in "!":
                indent = len(line) - len(content)
                if indent < len(path):
                    yield "/".join(path)
                path[indent:] = [content]
        if path:
            yield "/".join(path)
    

    If you cannot rely on a consistent indentation, but the indentation could increase with two spaces and sometimes three, ...etc, then you need to cope with that:

    def get_paths(lines):
        path = []
        indents = [0]
        for line in lines.splitlines():
            content = line.lstrip()
            if content.rstrip() not in "!":
                indent = len(line) - len(content)
                while indent < indents[-1]:
                    indents.pop()
                if indent > indents[-1]:
                    indents.append(indent)
                elif path:
                    yield "/".join(path)
                path[len(indents)-1:] = [content]
        if path:
            yield "/".join(path)
    

    Example use:

    s = """
    Shoes
    Fruits
     Red
      Apple
      Cherry
     !
     Yellow
      Banana
      Grapes
       Small
       Big
      !
     !
    !
    """
    
    result = list(get_paths(s))
    print(result)