How can I use the textwrap
module to split before a line reaches a certain amount of bytes (without splitting a multi-bytes character)?
I would like something like this:
>>> textwrap.wrap('☺ ☺☺ ☺☺ ☺ ☺ ☺☺ ☺☺', bytewidth=10)
☺ ☺☺
☺☺ ☺
☺ ☺☺
☺☺
I ended up rewriting a part of textwrap
to encode words after it split the string.
Unlike Tom's solution, the Python code does not need to iterate through every character.
def byteTextWrap(text, size, break_long_words=True):
"""Similar to textwrap.wrap(), but considers the size of strings (in bytes)
instead of their length (in characters)."""
try:
words = textwrap.TextWrapper()._split_chunks(text)
except AttributeError: # Python 2
words = textwrap.TextWrapper()._split(text)
words.reverse() # use it as a stack
if sys.version_info[0] >= 3:
words = [w.encode() for w in words]
lines = [b'']
while words:
word = words.pop(-1)
if len(word) > size:
words.append(word[size:])
word = word[0:size]
if len(lines[-1]) + len(word) <= size:
lines[-1] += word
else:
lines.append(word)
if sys.version_info[0] >= 3:
return [l.decode() for l in lines]
else:
return lines