Search code examples
pythonpython-3.xsplitword-wrappython-unicode

Using textwrap.wrap with bytes count


How can I use the textwrap module to split before a line reaches a certain amount of bytes (without splitting a multi-bytes character)?

I would like something like this:

>>> textwrap.wrap('☺ ☺☺ ☺☺ ☺ ☺ ☺☺ ☺☺', bytewidth=10)
☺ ☺☺
☺☺ ☺
☺ ☺☺
☺☺

Solution

  • I ended up rewriting a part of textwrap to encode words after it split the string.

    Unlike Tom's solution, the Python code does not need to iterate through every character.

    def byteTextWrap(text, size, break_long_words=True):
        """Similar to textwrap.wrap(), but considers the size of strings (in bytes)
        instead of their length (in characters)."""
        try:
            words = textwrap.TextWrapper()._split_chunks(text)
        except AttributeError: # Python 2
            words = textwrap.TextWrapper()._split(text)
        words.reverse() # use it as a stack
        if sys.version_info[0] >= 3:
            words = [w.encode() for w in words]
        lines = [b'']
        while words:
            word = words.pop(-1)
            if len(word) > size:
                words.append(word[size:])
                word = word[0:size]
            if len(lines[-1]) + len(word) <= size:
                lines[-1] += word
            else:
                lines.append(word)
        if sys.version_info[0] >= 3:
            return [l.decode() for l in lines]
        else:
            return lines