I am working on some code that has to manipulate unicode strings. I am trying to write doctests for it, but am having trouble. The following is a minimal example that illustrates the problem:
# -*- coding: utf-8 -*-
def mylen(word):
"""
>>> mylen(u"áéíóú")
5
"""
return len(word)
print mylen(u"áéíóú")
First we run the code to see the expected output of print mylen(u"áéíóú")
.
$ python mylen.py
5
Next, we run doctest on it to see the problem.
$ python -m
5
**********************************************************************
File "mylen.py", line 4, in mylen.mylen
Failed example:
mylen(u"áéíóú")
Expected:
5
Got:
10
**********************************************************************
1 items had failures:
1 of 1 in mylen.mylen
***Test Failed*** 1 failures.
How then can I test that mylen(u"áéíóú")
evaluates to 5?
If you want unicode strings, you have to use unicode docstrings! Mind the u
!
# -*- coding: utf-8 -*-
def mylen(word):
u""" <----- SEE 'u' HERE
>>> mylen(u"áéíóú")
5
"""
return len(word)
print mylen(u"áéíóú")
This will work -- as long as the tests pass. For Python 2.x you need yet another hack to make verbose doctest mode work or get correct tracebacks when tests fail:
if __name__ == "__main__":
import sys
reload(sys)
sys.setdefaultencoding("UTF-8")
import doctest
doctest.testmod()
NB! Only ever use setdefaultencoding for debug purposes. I'd accept it for doctest use, but not anywhere in your production code.