Search code examples
pythonemailword-wrapquote

python: A module for wrapping text quoted via email conventions?


I'm looking for a python module or some existing python code which can be used to wrap text that makes use of the ">" line prefix to indicate quoted text (see below for an example).

I know that I can use the python textwrap module to do wrapping on paragraphs of text. However, that module doesn't know about this kind of quoting prefix.

I know how to code up a routine which will perform this text wrapping, and I'm not looking for advice on how to write it. Rather, I'm wondering if anyone knows of any python code or python modules that already exist and which are already able to perform this kind of wrapping on email-type quoted text.

I have been searching, but I haven't found anything in python.

I just don't want to "re-invent the wheel", if something like this has already been written.

Here's an example of the text wrapping that I'd like to perform. Suppose I have the following text that comes from a hypothetical email message:

Abc defg hijk lmnop.

Mary had a little lamb.
Her fleas were white as snow,

> Now is the time for all good men to come to the aid of their party.
>
> The quick
> brown fox jumped over the lazy sleeping dog.

>> When in the Course of human
>> events it
>> becomes necessary for one people to dissolve the political
>> bands
>> which have
>> connected them ...
      and everywhere that Mary went,
      her fleas were sure to go
      ... and to reproduce.
> What do you mean by this?
>> with another
>> and to assume among
>> the powers of the earth ...
> Doo wah diddy, diddy dum, diddy doo.
>> Text text text text text text text text text text text text text text text text text text text text text text text text text text text.

Assuming that I want to wrap at column 52, the resulting text should look like this:

Abc defg hijk lmnop.

Mary had a little lamb. Her fleas were white as
snow,

> Now is the time for all good men to come to the
> aid of their party.
>
> The quick brown fox jumped over the lazy sleeping
> dog.

>> When in the Course of human events it becomes
>> necessary for one people to dissolve the
>> political bands which have connected them ...
      and everywhere that Mary went, her fleas were
      sure to go ... and to reproduce.
> What do you mean by this?
>> with another and to assume among the powers of
>> the earth ...
> Doo wah diddy, diddy dum, diddy doo.
>> Text text text text text text text text text text
>> text text text text text text text text text text
>> text text text text text text text.

Thanks for any references to existing python code.

If no such thing exists "out in the wild", I'll then write this and post my code here.

Thank you very much.


Solution

  • I couldn't find any existing code which wraps this kind of quoted text, so here's the code that I wrote. It makes use of the re and textwrap modules.

    I break the code into "paragraphs" based on the number of initial quote or indentation characters. I then use textwrap to wrap each "paragraph" with the quote-or-indentation prefix removed from each line. After wrapping, I re-prepend the prefix to each line of the "paragraph".

    Some day I will clean up the code and make it a bit more elegant, but at least it seems to work fine, as is.

    import re
    import textwrap
    def wrapemail(text, wrap=72):
        if not text:
            return ''
        prefix      = None
        prev_prefix = None
        paragraph   = []
        paragraphs  = []
        for line in text.rstrip().split('\n'):
            line = line.rstrip()
            m = wrapemail.qprefixpat.search(line)
            if m:
                prefix = wrapemail.whitepat.sub('', m.group(1))
                text   = m.group(2)
                if text and wrapemail.whitepat.search(text[0]):
                    prefix += text[0]
                    text    = text[1:]
            else:
                m = wrapemail.wprefixpat.search(line)
                if m:
                    prefix = m.group(1)
                    text   = m.group(2)
                else:
                    prefix = ''
                    text   = line
            if not text:
                if paragraph and prev_prefix is not None:
                    paragraphs.append((prev_prefix, paragraph))
                paragraphs.append((prefix, ['']))
                prev_prefix = None
                paragraph   = []
            elif prefix != prev_prefix:
                if paragraph and prev_prefix is not None:
                    paragraphs.append((prev_prefix, paragraph))
                prev_prefix = prefix
                paragraph   = []
            paragraph.append(text)
        if paragraph and prefix is not None:
            paragraphs.append((prefix, paragraph))
        result = ''
        for paragraph in paragraphs:
            prefix = paragraph[0]
            text   = '\n'.join(paragraph[1]).rstrip()
            wraplen = wrap - len(prefix)
            if wraplen < 1:
                result += '{}{}\n'.format(prefix, text)
            elif text:
                for line in textwrap.wrap(text, wraplen):
                    result += '{}{}\n'.format(prefix, line.rstrip())
            else:
                result += '{}\n'.format(prefix)
        return result
    wrapemail.qprefixpat = re.compile(r'^([\s>]*>)([^>]*)$')
    wrapemail.wprefixpat = re.compile(r'^(\s+)(\S.*)?$')
    wrapemail.whitepat   = re.compile(r'\s')
    

    Feeding the text in my original message to it with 'wrap' specified as 52 will indeed produce the output that I specified above.

    Feel free to improve upon this or steal it. :)