When I use a triple-quoted multiline string in Python, I tend to use textwrap.dedent to keep the code readable, with good indentation:
some_string = textwrap.dedent("""
First line
Second line
...
""").strip()
However, in Python 3.x, textwrap.dedent doesn't seem to work with byte strings. I encountered this while writing a unit test for a method that returned a long multiline byte string, for example:
# The function to be tested
def some_function():
return b'Lorem ipsum dolor sit amet\n consectetuer adipiscing elit'
# Unit test
import unittest
import textwrap
class SomeTest(unittest.TestCase):
def test_some_function(self):
self.assertEqual(some_function(), textwrap.dedent(b"""
Lorem ipsum dolor sit amet
consectetuer adipiscing elit
""").strip())
if __name__ == '__main__':
unittest.main()
In Python 2.7.10 the above code works fine, but in Python 3.4.3 it fails:
E
======================================================================
ERROR: test_some_function (__main__.SomeTest)
----------------------------------------------------------------------
Traceback (most recent call last):
File "test.py", line 16, in test_some_function
""").strip())
File "/usr/lib64/python3.4/textwrap.py", line 416, in dedent
text = _whitespace_only_re.sub('', text)
TypeError: can't use a string pattern on a bytes-like object
----------------------------------------------------------------------
Ran 1 test in 0.001s
FAILED (errors=1)
So: Is there an alternative to textwrap.dedent that works with byte strings?
It seems like dedent
does not support bytestrings, sadly. However, if you want cross-compatible code, I recommend you take advantage of the six
library:
import sys, unittest
from textwrap import dedent
import six
def some_function():
return b'Lorem ipsum dolor sit amet\n consectetuer adipiscing elit'
class SomeTest(unittest.TestCase):
def test_some_function(self):
actual = some_function()
expected = six.b(dedent("""
Lorem ipsum dolor sit amet
consectetuer adipiscing elit
""")).strip()
self.assertEqual(actual, expected)
if __name__ == '__main__':
unittest.main()
This is similar to your bullet point suggestion in the question
I could convert to unicode, use textwrap.dedent, and convert back to bytes. But this is only viable if the byte string conforms to some Unicode encoding.
But you're misunderstanding something about encodings here - if you can write the string literal in your test like that in the first place, and have the file successfully parsed by python (i.e. the correct coding declaration is on the module), then there is no "convert to unicode" step here. The file gets parsed in the encoding specified (or sys.defaultencoding
, if you didn't specify) and then when the string is a python variable it is already decoded.