Search code examples
pythonformattingmarkuptokenizetypography

Parse text to replace quotes and nested quotes


Using python, I would like "educate" quotes of a a plain text input and turn them into the Context syntax. Here is a (recursive) example:

original text:

Using python, I would like "educate" quotes of 
a plain text input and turn them into the Context syntax. 
Here is a (recursive) example:

output:

Using python, I would like \quotation{educate} quotes of 
a plain text input and turn them into the Context syntax. 
Here is a (recursive) example:

I would like it to handle nested quotations as well:

original text:

Original text: "Using python, I would like 'educate' quotes of 
a plain text input and turn them into the Context syntax. 
Here is a (recursive) example:"

output:

Original text: \quotation {Using python, I would like \quotation{educate} quotes of 
a plain text input and turn them into the Context syntax. 
Here is a (recursive) example:}

And of course, I should take care of edge cases such as:

She said "It looks like we are back in the '90s"

The specification for context quotes is here:

http://wiki.contextgarden.net/Nested_quotations#Nested_quotations_in_MkIV

What is the most sensitive approach to such a situation? Thank you very much!


Solution

  • This one works with nested quotes, although it does not handle your edge cases

    def quote(string):
        text = ''
        stack = []
        for token in iter_tokes(string):
            if is_quote(token):
                if stack and stack[-1] == token: # closing
                    text += '}'
                    stack.pop()
                else: # opening
                    text += '\\quotation{'
                    stack.append(token)
            else:
                text += token
        return text
    
    def iter_tokes(string):
        i = find_quote(string)
        if i is None:
            yield string
        else:
            if i > 0:
                yield string[:i]
            yield string[i]
            for q in iter_tokes(string[i+1:]):
                yield q
    
    def find_quote(string):
        for i, char in enumerate(string):
            if is_quote(char):
                return i
        return None
    
    def is_quote(char):
        return char in '\'\"'
    
    def main():
        quoted = None
        with open('input.txt') as fh:
            quoted = quote(fh.read())
        print quoted
    
    main()