python string unicode python-2.x conventions

str versus unicode

Is there a Python convention for when you should implement __str__() versus __unicode__()? I've seen classes override __unicode__() more frequently than __str__() but it doesn't appear to be consistent. Are there specific rules when it is better to implement one versus the other? Is it necessary/good practice to implement both?

Solution

__str__() is the old method -- it returns bytes. __unicode__() is the new, preferred method -- it returns characters. The names are a bit confusing, but in 2.x we're stuck with them for compatibility reasons. Generally, you should put all your string formatting in __unicode__(), and create a stub __str__() method:

def __str__(self):
    return unicode(self).encode('utf-8')

In 3.0, str contains characters, so the same methods are named __bytes__() and __str__(). These behave as expected.

__str__ versus __unicode__

str versus unicode