Search code examples
pythonword-wrap

how to keep double newlines in `textwrap.fill`?


I have a text that consists of several paragraphs that are divided by double newlines. I'd like to format them to a line width of 70, keeping the new lines and the whole thing should start with one non-indented line with Abstract: Lorem ipsum ....

So the whole thing should look like this:

Abstract: Magna risus nonummy mollis mattis neque commodo mattis fusce  
          hendrerit nibh. Lorem massa lorem mauris ad orci quam risus
          viverra aliquet senectus sociis. Donec proin nam dolor neque
          placerat imperdiet eros ullamcorper egestas cum torquent
          habitasse. Risus donec odio nostra ac et pede inceptos
          praesent montes. Neque morbi sit morbi vestibulum
          suspendisse mauris. Lacus massa mollis.

          Donec class integer pede ac sed elit. Fames augue magnis
          sapien natoque nisi. Proin augue mus nisl interdum convallis
          pellentesque conubia.

          Class dolor tempor netus suspendisse odio orci
          vestibulum mus. Netus purus. Lacus metus tempor purus
          adipiscing faucibus eget maecenas. Velit lacus integer
          rhoncus primis nunc quis lorem lacus dictumst hendrerit.

I am trying to use textwrap, but that doesn't produce the desired output. Here's the code:

from loremipsum import get_paragraphs
import textwrap

text = '\n\n'.join(get_paragraphs(3))
item = 'Abstract: '

print textwrap.fill(item+text,initial_indent='',subsequent_indent=' '*len(item),replace_whitespace=False)

This works fine for the first paragraph, but the following paragraphs get some strange indentation and short lines like this

Class vitae
          nonummy imperdiet cras blandit fusce. Massa porta metus
          semper tempor non id viverra eget. Purus morbi lorem semper
          eget. Proin magna tortor metus magnis. Vitae ipsum. Velit
          class aliquet tortor dolor parturient ullamcorper libero ac.

This happens even if I use initial_indent=' '*len(item). Is this a bug? How can I get what I want?


Solution

  • From the documentation:

    Note: If replace_whitespace is false, newlines may appear in the middle of a line and cause strange output. For this reason, text should be split into paragraphs (using str.splitlines() or similar) which are wrapped separately.

    So you should do something like:

    paragraphs = get_paragraphs(3)
    item = 'Abstract: '
    paragraphs[0] = item + paragraphs[0]
    for idx, paragraph in enumerate(paragraphs):
        rest_indent = " "*len(item)
        start_indent = "" if idx == 0 else rest_indent
        print textwrap.fill(paragraph,initial_indent=start_indent,subsequent_indent=rest_indent,replace_whitespace=False)
        print ""
    

    Alternatively, using a list comprehension:

    paragraphs = get_paragraphs(3)
    item = 'Abstract: '
    text = "\n\n".join(textwrap.fill(p,initial_indent=' '*len(item),subsequent_indent=' '*len(item)) for p in paragraphs)
    print item + text.lstrip()