Search code examples
pythonunicodedoctest

How do I include unicode strings in Python doctests?


I am working on some code that has to manipulate unicode strings. I am trying to write doctests for it, but am having trouble. The following is a minimal example that illustrates the problem:

# -*- coding: utf-8 -*-
def mylen(word):
  """
  >>> mylen(u"áéíóú")
  5
  """
  return len(word)

print mylen(u"áéíóú")

First we run the code to see the expected output of print mylen(u"áéíóú").

$ python mylen.py
5

Next, we run doctest on it to see the problem.

$ python -m
5
**********************************************************************
File "mylen.py", line 4, in mylen.mylen
Failed example:
    mylen(u"áéíóú")
Expected:
    5
Got:
    10
**********************************************************************
1 items had failures:
   1 of   1 in mylen.mylen
***Test Failed*** 1 failures.

How then can I test that mylen(u"áéíóú") evaluates to 5?


Solution

  • If you want unicode strings, you have to use unicode docstrings! Mind the u!

    # -*- coding: utf-8 -*-
    def mylen(word):
      u"""        <----- SEE 'u' HERE
      >>> mylen(u"áéíóú")
      5
      """
      return len(word)
    
    print mylen(u"áéíóú")
    

    This will work -- as long as the tests pass. For Python 2.x you need yet another hack to make verbose doctest mode work or get correct tracebacks when tests fail:

    if __name__ == "__main__":
        import sys
        reload(sys)
        sys.setdefaultencoding("UTF-8")
        import doctest
        doctest.testmod()
    

    NB! Only ever use setdefaultencoding for debug purposes. I'd accept it for doctest use, but not anywhere in your production code.