Search code examples
pythondictionaryoverloadingsubclassing

overloaded __iter__ is bypassed when deriving from dict


Trying to create a custom case-insensitive dictionary, I came the following inconvenient and (from my point-of-view) unexpected behaviour. If deriving a class from dict, the overloaded __iter__, keys, values functions are ignored when converting back to dict. I have condensed it to the following test case:

import collections

class Dict(dict):
    def __init__(self):
        super(Dict, self).__init__(x = 1)

    def __getitem__(self, key):
        return 2

    def values(self):
        return 3

    def __iter__(self):
        yield 'y'

    def keys(self):
        return 'z'

    if hasattr(collections.MutableMapping, 'items'):
        items = collections.MutableMapping.items
    if hasattr(collections.MutableMapping, 'iteritems'):
        iteritems = collections.MutableMapping.iteritems

d = Dict()
print(dict(d))              # {'x': 1}
print(dict(d.items()))      # {'y': 2}

The values for keys,values and __iter__,__getitem__ are inconsistent only for demonstration which methods are actually called.

The documentation for dict.__init__ says:

If a positional argument is given and it is a mapping object, a dictionary is created with the same key-value pairs as the mapping object. Otherwise, the positional argument must be an iterator object.

I guess it has something to do with the first sentence and maybe with optimizations for builtin dictionaries.

Why exactly does the call to dict(d) not use any of keys, __iter__? Is it possible to overload the 'mapping' somehow to force the dict constructor to use my presentation of key-value pairs?

Why did I use this? For a case-insensitive but -preserving dictionary, I wanted to:

  • store (lowercase => (original_case, value)) internally, while appearing as (any_case => value).
  • derive from dict in order to work with some external library code that uses isinstance checks
  • not use 2 dictionary lookups: lower_case=>original_case, followed by original_case=>value (this is the solution which I am doing now instead)

If you are interested in the application case: here is corresponding branch


Solution

  • In the file dictobject.c, you see in line 1795ff. the relevant code:

    static int
    dict_update_common(PyObject *self, PyObject *args, PyObject *kwds, char *methname)
    {
        PyObject *arg = NULL;
        int result = 0;
    
        if (!PyArg_UnpackTuple(args, methname, 0, 1, &arg))
            result = -1;
    
        else if (arg != NULL) {
            _Py_IDENTIFIER(keys);
            if (_PyObject_HasAttrId(arg, &PyId_keys))
                result = PyDict_Merge(self, arg, 1);
            else
                result = PyDict_MergeFromSeq2(self, arg, 1);
        }
        if (result == 0 && kwds != NULL) {
            if (PyArg_ValidateKeywordArguments(kwds))
                result = PyDict_Merge(self, kwds, 1);
            else
                result = -1;
        }
        return result;
    }
    

    This tells us that if the object has an attribute keys, the code which is called is a mere merge. The code called there (l. 1915 ff.) makes a distinction between real dicts and other objects. In the case of real dicts, the items are read out with PyDict_GetItem(), which is the "most inner interface" to the object and doesn't bother using any user-defined methods.

    So instead of inheriting from dict, you should use the UserDict module.