I have list such as the following:
- launchers
- say hello
- command: echo "hello" | festival --tts
- icon: sayHello.png
- say world
- command: echo "world" | festival --tts
- icon: sayWorld.png
- wait
- command: for ((x = 0; x < 10; ++x)); do :; done
- icon: wait.png
I would like to parse it to a dictionary like the following:
{
"launchers": {
"say hello": {
"command": "echo \"hello\" | festival --tts",
"icon": "sayHello.png"
}
"say world": {
"command": "echo \"world\" | festival --tts",
"icon": "sayWorld.png"
}
"wait": {
"command": "for ((x = 0; x < 10; ++x)); do :; done",
"icon": "wait.png"
}
}
}
I've started on some very manual code that counts leading spaces (e.g. len(line.rstrip()) - len(line.rstrip().lstrip())
), but I'm wondering if there is a more sensible way of approaching this. I am aware that JSON can be imported into Python, but this doesn't suit my purposes. So, how can a Markdown list in a file be parsed to a dictionary in Python? Is there an efficient way of doing this?
Here's some basic code I'm playing with now:
for line in open("configuration.md", 'r'):
indentation = len(line.rstrip()) - len(line.rstrip().lstrip())
listItem = line.split('-')[1].strip()
listItemSplit = listItem.split(':')
key = listItemSplit[0].strip()
if len(listItemSplit) == 2:
value = listItemSplit[1].strip()
else:
value = ""
print(indentation, key, value)
I'd assume a more rigid format and use a stack and a regular expression:
import re
line = re.compile(r'( *)- ([^:\n]+)(?:: ([^\n]*))?\n?')
depth = 0
stack = [{}]
for indent, name, value in line.findall(inputtext):
indent = len(indent)
if indent > depth:
assert not stack[-1], 'unexpected indent'
elif indent < depth:
stack.pop()
stack[-1][name] = value or {}
if not value:
# new branch
stack.append(stack[-1][name])
depth = indent
result = stack[0]
This produces:
>>> import re
>>> inputtext = '''\
... - launchers
... - say hello
... - command: echo "hello" | festival --tts
... - icon: sayHello.png
... - say world
... - command: echo "world" | festival --tts
... - icon: sayWorld.png
... - wait
... - command: for ((x = 0; x < 10; ++x)); do :; done
... - icon: wait.png
... '''
>>> line = re.compile(r'( *)- ([^:\n]+)(?:: ([^\n]*))?\n?')
>>> depth = 0
>>> stack = [{}]
>>> for indent, name, value in line.findall(inputtext):
... indent = len(indent)
... if indent > depth:
... assert not stack[-1], 'unexpected indent'
... elif indent < depth:
... stack.pop()
... stack[-1][name] = value or {}
... if not value:
... # new branch
... stack.append(stack[-1][name])
... depth = indent
...
{'command': 'echo "hello" | festival --tts', 'icon': 'sayHello.png'}
{'command': 'echo "world" | festival --tts', 'icon': 'sayWorld.png'}
>>> result = stack[0]
>>> from pprint import pprint
>>> pprint(result)
{'launchers': {'say hello': {'command': 'echo "hello" | festival --tts',
'icon': 'sayHello.png'},
'say world': {'command': 'echo "world" | festival --tts',
'icon': 'sayWorld.png'},
'wait': {'command': 'for ((x = 0; x < 10; ++x)); do :; done',
'icon': 'wait.png'}}}
from your input text.