python python-3.x split word-wrap python-unicode

Using textwrap.wrap with bytes count

How can I use the textwrap module to split before a line reaches a certain amount of bytes (without splitting a multi-bytes character)?

I would like something like this:

>>> textwrap.wrap('☺ ☺☺ ☺☺ ☺ ☺ ☺☺ ☺☺', bytewidth=10)
☺ ☺☺
☺☺ ☺
☺ ☺☺
☺☺

Solution

I ended up rewriting a part of textwrap to encode words after it split the string.

Unlike Tom's solution, the Python code does not need to iterate through every character.

def byteTextWrap(text, size, break_long_words=True):
    """Similar to textwrap.wrap(), but considers the size of strings (in bytes)
    instead of their length (in characters)."""
    try:
        words = textwrap.TextWrapper()._split_chunks(text)
    except AttributeError: # Python 2
        words = textwrap.TextWrapper()._split(text)
    words.reverse() # use it as a stack
    if sys.version_info[0] >= 3:
        words = [w.encode() for w in words]
    lines = [b'']
    while words:
        word = words.pop(-1)
        if len(word) > size:
            words.append(word[size:])
            word = word[0:size]
        if len(lines[-1]) + len(word) <= size:
            lines[-1] += word
        else:
            lines.append(word)
    if sys.version_info[0] >= 3:
        return [l.decode() for l in lines]
    else:
        return lines