Search code examples
python-2.7unicodepython-sphinxnon-ascii-charactersdoctest

How can I test output of non-ASCII characters using Sphinx doctest?


I'm at a loss how to test printing output that includes non-ASCII characters using Sphinx doctest.

When I have test that include code that generates non-ASCII characters, or that contains expected results that include non-ASCII characters, I get encoding errors.

For example, if I have:

def foo():
    return 'γ'

then a doctest including

>>> print(foo())

will produce an error of the form

Encoding error: 'ascii' codec can't encode character u'\u03b3' in position 0: ordinal not in range(128)

as will any test of the form

>>> print('')
γ

Only by ensuring that none of my functions whose results I'm attempting to print, and none of the expected printed results, contain such characters can I avoid these errors. As a result I've had to disable many important tests.

At the head of all my code I have

# encoding: utf8
from __future__ import unicode_literals

and (in desperation) I've tried things like

doctest_global_setup =(
    '#encoding: utf8\n\n'
    'from __future__ import unicode_literals\n'
)

and

.. testsetup:: 
   from __future__ import unicode_literals

but these (of course) don't change the outcome.

How can I test output of non-ASCI characters using Sphinx doctest?


Solution

  • I believe it is due to your from __future__ import unicode_literals statement. print will implicitly encode Unicode strings to the terminal encoding. Lacking a terminal, Python 2 will default to the ascii codec.

    If you skip an explicit print, it will work with or without import:

    >>> def foo():
    ...  return 'ë'
    ...
    >>> foo()
    '\x89'
    

    Or:

    >>> from __future__ import unicode_literals
    >>> def foo():
    ...  return 'ë'
    ...
    >>> foo()
    u'\xeb'
    

    Then you can test for the escaped representation of the string.

    You can also try changing the encoding of print itself with PYTHONIOENCODING=utf8.