Search code examples
pythoninheritancenew-operator

python: subclass of subclass of int


I'm trying to understand how to subclass int properly. One goal is to define types used in structures within a certain binary file format. For example, an unsigned, 16-bit integer. I've defined a class as follows, which seems to do what I expect:

class uint16(int):

    def __new__(cls, val):
        if (val < 0 or val > 0xffff):
            raise ValueError("uint16 must be in the range %d to %d" % (0, 0xffff))
        return super(cls, cls).__new__(cls, val)

Now, I'm not super clear on the use of super with no args versus (type, object) versus (type, type). I've used super(cls, cls) as I saw that used for a similar scenario.

Now, C makes it easy to create types that are effectively aliases of existing types. For example,

typedef unsigned int        UINT;

Aliases might be considered useful to help clarify the intended usage for a type. Whether one agrees or not, a description of a binary format can sometimes do this, and if so, then for clarity it would be helpful to replicate this in Python.

So, I tried the following:

class Offset16(uint16):

    def __new__(cls, val):
        return super(cls, cls).__new__(cls, val)

I could have made Offset16 a subclass of int, but then I'd want to repeat the validation (more duplicated code). By sub-classing uint16, I avoid the duplicate code.

But when I try to construct an Offset16 object, I get a recursion error:

>>> x = Offset16(42)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 3, in __new__
  File "<stdin>", line 5, in __new__
  File "<stdin>", line 5, in __new__
  File "<stdin>", line 5, in __new__
  [Previous line repeated 987 more times]
  File "<stdin>", line 3, in __new__
RecursionError: maximum recursion depth exceeded in comparison
>>> 

Since the call stack just line line 5 repeated (not lines 3/5 in alternation), the line in uint16.__new__ is getting re-entered.

I then tried revising Offset16.__new__ in different ways, changing the args to super, most of which didn't work. But one last attempt was the following:

class Offset16(uint16):

    def __new__(cls, val):
        return super(uint16, cls).__new__(cls, val)

This seems to work:

>>> x = Offset16(42)
>>> x
42
>>> type(x)
<class '__main__.Offset16'>

Why the difference?

This latter approach seems to defeat part of the purpose of super: avoiding reference to the base class to make it easier to maintain. Is there a way to make this work that doesn't require reference to uint16 within the __new__ implementation?

And what's the best way to do this?


Solution

  • Comments have supplied info that has helped answer Why the difference? and What's the best way?


    First: Why the difference?

    In the original definitions of uint16 and Offset16, the __new__ method uses super(cls,cls). As @juanpa.arrivillaga pointed out, when Offset16.__new__ is call it leads to uint16.__new__ calling itself recursively. By having Offset16.__new__ use super(uint16,cls), it changes the behaviour inside uint16.__new__.

    Some additional explanation may help to understand:

    The cls argument passed into Offset16.__new__ is the Offset16 class itself. So, when the implementation of the method refers to cls, that is a reference to Offset16. So,

        return super(cls, cls).__new__(cls, val)
    

    is equivalent in that case to

        return super(Offset16, Offset16).__new__(Offset16, val)
    

    Now we might think of super as returning the base class, but its semantics when arguments are provided is more subtle: super is resolving a reference to a method and the arguments affect how that resolution happens. If no arguments are provided, super().__new__ is the method in the immediate superclass. When arguments are provided, that affects the search. In particular for super(type1, type2), the MRO (method resolution order) of type2 will be searched for an occurrence of type1, and the class following type1 in that sequence will be used.

    (This is explained in the documentation of super, though the wording could be clearer.)

    The MRO for Offset16 is (Offset16, uint16, int, object). Therefore

        return super(Offset16, Offset16).__new__(Offset16, val)
    

    resolves to

        return uint16.__new__(Offset16, val)
    

    When uint16.__new__ is called in this way, the class argument passed to it is Ofset16, not uint16. As a result, when its implementation has

        return super(cls, cls).__new__(cls, val)
    

    that once again will resolve to

        return uint16.__new__(Offset16, val)
    

    This is why we end up with an infinite loop.

    But in the changed definition of Offset16,

    class Offset16(uint16):
    
        def __new__(cls, val):
            return super(uint16, cls).__new__(cls, val)
    

    the last line is equivalent to

            return super(uint16, Offset16).__new__(Offset16, val)
    

    and per the MRO for Offset16 and the semantics for super mentioned above, that resolves to

            return int.__new__(Offset16, val)
    

    That explains why the changed definition results in a different behaviour.


    Second: What's the best way to do this?

    Different alternatives were provided in comments that might fit different situations.

    @juanpa.arrivillaga suggested (assuming Python3) simply using super() without arguments. For the approach that was being taken in the question, this makes sense. The reason for passing arguments to super would be to manipulate the MRO search. In this simple class hierarchy, that's not needed.

    @Jason Yang suggested referring directly to the specific superclass rather than using super. For instance:

    class Offset16(uint16):
    
        def __new__(cls, val):
            return uint16.__new__(cls, val)
    

    That is perfectly fine for this simple situation. But it might not be the best for other scenarios with more complex class relationships. Note, for instance, that uint16 is duplicated in the above. If the subclass had several methods that wrapped (rather than replaced) the superclass method, there would be many duplicate references, and making changes to the class hierarchy would result in hard-to-analyze bugs. Avoiding such problems is one of the intended benefits for using super.

    Finally, @Adam.Er8 suggested simply using

    Offset16 = uint16
    

    That's very simple, indeed. The one caveat is that Offset16 is truly no more than an alias for uint16; it's not a separate class. For example:

    >>> Offset16 = uint16
    >>> x = Offset16(24)
    >>> type(x)
    <class 'uint16'>
    

    So, this may be fine so long as there's never a need in the app to have an actual type distinction.