Search code examples
pythonlistparsingmarkdown

How can a Markdown list be parsed to a dictionary in Python?


I have list such as the following:

- launchers
   - say hello
      - command: echo "hello" | festival --tts
      - icon: sayHello.png
   - say world
      - command: echo "world" | festival --tts
      - icon: sayWorld.png
   - wait
      - command: for ((x = 0; x < 10; ++x)); do :; done
      - icon: wait.png

I would like to parse it to a dictionary like the following:

{
    "launchers": {
        "say hello": {
            "command": "echo \"hello\" | festival --tts",
            "icon": "sayHello.png"
        }
        "say world": {
            "command": "echo \"world\" | festival --tts",
            "icon": "sayWorld.png"
        }
        "wait": {
            "command": "for ((x = 0; x < 10; ++x)); do :; done",
            "icon": "wait.png"
        }
    }
}

I've started on some very manual code that counts leading spaces (e.g. len(line.rstrip()) - len(line.rstrip().lstrip())), but I'm wondering if there is a more sensible way of approaching this. I am aware that JSON can be imported into Python, but this doesn't suit my purposes. So, how can a Markdown list in a file be parsed to a dictionary in Python? Is there an efficient way of doing this?

Here's some basic code I'm playing with now:

for line in open("configuration.md", 'r'):
    indentation = len(line.rstrip()) - len(line.rstrip().lstrip())
    listItem = line.split('-')[1].strip()
    listItemSplit = listItem.split(':')
    key = listItemSplit[0].strip()
    if len(listItemSplit) == 2:
        value = listItemSplit[1].strip()
    else:
        value = ""
    print(indentation, key, value)

Solution

  • I'd assume a more rigid format and use a stack and a regular expression:

    import re    
    
    line = re.compile(r'( *)- ([^:\n]+)(?:: ([^\n]*))?\n?')
    depth = 0
    stack = [{}]
    for indent, name, value in line.findall(inputtext):
        indent = len(indent)
        if indent > depth:
            assert not stack[-1], 'unexpected indent'
        elif indent < depth:
            stack.pop()
        stack[-1][name] = value or {}
        if not value:
            # new branch
            stack.append(stack[-1][name])
        depth = indent
    
    result = stack[0]
    

    This produces:

    >>> import re
    >>> inputtext = '''\
    ... - launchers
    ...    - say hello
    ...       - command: echo "hello" | festival --tts
    ...       - icon: sayHello.png
    ...    - say world
    ...       - command: echo "world" | festival --tts
    ...       - icon: sayWorld.png
    ...    - wait
    ...       - command: for ((x = 0; x < 10; ++x)); do :; done
    ...       - icon: wait.png
    ... '''
    >>> line = re.compile(r'( *)- ([^:\n]+)(?:: ([^\n]*))?\n?')
    >>> depth = 0
    >>> stack = [{}]
    >>> for indent, name, value in line.findall(inputtext):
    ...     indent = len(indent)
    ...     if indent > depth:
    ...         assert not stack[-1], 'unexpected indent'
    ...     elif indent < depth:
    ...         stack.pop()
    ...     stack[-1][name] = value or {}
    ...     if not value:
    ...         # new branch
    ...         stack.append(stack[-1][name])
    ...     depth = indent
    ... 
    {'command': 'echo "hello" | festival --tts', 'icon': 'sayHello.png'}
    {'command': 'echo "world" | festival --tts', 'icon': 'sayWorld.png'}
    >>> result = stack[0]
    >>> from pprint import pprint
    >>> pprint(result)
    {'launchers': {'say hello': {'command': 'echo "hello" | festival --tts',
                                 'icon': 'sayHello.png'},
                   'say world': {'command': 'echo "world" | festival --tts',
                                 'icon': 'sayWorld.png'},
                   'wait': {'command': 'for ((x = 0; x < 10; ++x)); do :; done',
                            'icon': 'wait.png'}}}
    

    from your input text.