Consider this text:
Would you like to have responses to your questions sent to you via email ?
I'm going to propose multiple choices for several words by marking up them like that:
Would you like [to get]|[having]|g[to have] responses to your questions sent [up to]|g[to]|[on] you via email ?
The choices are bracketted and separated by pipes
The good choice is preceded by a g
I would like to parse this sentence to get the text formatted like that:
Would you like __ responses to your questions sent __ you via email ?
With a list like:
[
[
{"to get":0},
{"having":0},
{"to have":1},
],
[
{"up to":0},
{"to":1},
{"on":0},
],
]
Is my markup design ok ?
How to regex the sentence to get the needed result and generate the list ?
edit: User oriented markup language needed
I will suggest my solution too:
Would you like {to get|having|+to have} responses to your questions sent {up to|+to|on} you via email ?
def extract_choices(text):
choices = []
def callback(match):
variants = match.group().strip('{}')
choices.append(dict(
(v.lstrip('+'), v.startswith('+'))
for v in variants.split('|')
))
return '___'
text = re.sub('{.*?}', callback, text)
return text, choices
Lets try it:
>>> t = 'Would you like {to get|having|+to have} responses to your questions sent {up to|+to|on} you via email?'
>>> pprint.pprint(extract_choices(t))
... ('Would you like ___ responses to your questions sent ___ you via email?',
... [{'having': False, 'to get': False, 'to have': True},
... {'on': False, 'to': True, 'up to': False}])