Search code examples

How do Python's string and unicode coercion/magic functions work?

I'm using Python version: 2.7.3.

In Python, we use the magic methods __str__ and __unicode__ to define the behavior of str and unicode on our custom classes:

>>> class A(object):
  def __str__(self):
    print 'Casting A to str'
    return u'String'
  def __unicode__(self):
    print 'Casting A to unicode'
    return 'Unicode'

>>> a = A()
>>> str(a)
Casting A to str
>>> unicode(a)
Casting A to unicode

The behavior suggests that the return value from __str__ and __unicode__ is coerced to either str or unicode depending on which magic method is run.

However, if we do this:

>>> class B(object):
  def __str__(self):
    print 'Casting B to str'
    return A()
  def __unicode__(self):
    print 'Casting B to unicode'
    return A()

>>> b = B()
>>> str(b)
Casting B to str

Traceback (most recent call last):
  File "<pyshell#47>", line 1, in <module>
TypeError: __str__ returned non-string (type A)
>>> unicode(b)
Casting B to unicode

Traceback (most recent call last):
  File "<pyshell#48>", line 1, in <module>
TypeError: coercing to Unicode: need string or buffer, A found

Calling str.mro() and unicode.mro() says that both are subclasses of basestring. However, __unicode__ also allows returning of buffer objects, which directly inherits from object and doesn't inherit from basestring.

So, my question is, what actually happens when str and unicode are called? What are the return value requirements on __str__ and __unicode__ for use in str and unicode?


  • However, __unicode__ also allows returning of buffer objects, which directly object and don't inherit from basestring.

    This is not correct. unicode() can convert a string or a buffer. It is a "best attempt" at converting the passed argument to unicode using the default encoding (that's why it says coercing). It will always return a unicode object.

    So, my question is, what actually happens when str and unicode are called? What are the return value requirements on __str__ and __unicode__ for use in str and unicode?

    __str__ should return an informal, human-friendly string representation of the object. This is what is called when someone uses str() on your object, or when your object is part of a print statement.

    __unicode__ should always return a unicode object. If this method is not defined, __str__ is called and then the results are coerced to unicode (by passing them to unicode()).

    In your second example, you are returning invalid objects which is why you are seeing the error messages. Your first example appears to work for __unicode__ because of a side-effect, but it is also not written correctly.

    The data model section of the documentation is worth a read for more information and details on these "magic methods".