Search code examples
pythoncollectionsnamedtuple

Is there a dictionary-like object that is immutable?


I would like a Python object that can flexibly take any key and I can access by key, like a dictionary, but is immutable. One option could be to flexibly generate a namedtuple but is it bad practice to do this? In the example below a linter would not expect nt to have attribute a for example.

Example:

from collections import namedtuple

def foo(bar):
    MyNamedTuple = namedtuple("MyNamedTuple", [k for k in bar.keys()])
    d = {k: v for k, v in bar.items()}
    return MyNamedTuple(**d)

>>> nt = foo({"a": 1, "b": 2})

Solution

  • I mentioned it in the comments, that I'm not sure why this is needed.
    But one could simply override __setitem__ of a dictionary class. Alltho this might (most likely) cause problems down the line. A minimal example of this would be:

    class autodict(dict):
        def __init__(self, *args, **kwargs):
            super(autodict, self).__init__(*args, **kwargs)
    
        def __getitem__(self, key):
            val = dict.__getitem__(self, key)
            return val
    
        def __setitem__(self, key, val):
            pass
    
    x = autodict({'a' : 1, 'b' : 2})
    x['c'] = 3
    print(x)
    

    Which will produce {'a': 1, 'b': 2} and thus ignoring the x['c'] = 3 set.


    Some benefits

    The speed difference is some where between 40-1000 times faster using dictionary inheritance compared to named tuples. (See below for crude speed tests)

    The in operator works on dictionaries, not so well on named tuples when used like this:

    'a' in nt == False
    'a' in x == True
    

    You can use key access dictionary style instead of (for lack of a better term) JavaScript style

    x['a'] == nt.a
    

    Although that's a matter of taste.

    You also don't have to be picky about keys, since dictionaries support essentially any key identifier:

    x[1] = 'a number'
    nt = foo({1 : 'a number'})
    

    Named tuples will result in Type names and field names must be valid identifiers: '1'


    Optimizations (timing the thing)

    Now, this is a crude example, and it would vary a lot depending on the system, the place of the moon in the sky etc.. But as a crude example:

    import time
    from collections import namedtuple
    
    class autodict(dict):
        def __init__(self, *args, **kwargs):
            super(autodict, self).__init__(*args, **kwargs)
            #self.update(*args, **kwargs)
    
        def __getitem__(self, key):
            val = dict.__getitem__(self, key)
            return val
    
        def __setitem__(self, key, val):
            pass
    
        def __type__(self, *args, **kwargs):
            return dict
    
    def foo(bar):
        MyNamedTuple = namedtuple("MyNamedTuple", [k for k in bar.keys()])
        d = {k: v for k, v in bar.items()}
        return MyNamedTuple(**d)
    
    start = time.time()
    for i in range(1000000):
        nt = foo({'x'+str(i) : i})
    end = time.time()
    print('Named tuples:', end - start,'seconds.')
    
    start = time.time()
    for i in range(1000000):
        x = autodict({'x'+str(i) : i})
    end = time.time()
    print('Autodict:', end - start,'seconds.')
    

    Results in:

    Named tuples: 59.21987843513489 seconds.
    Autodict: 1.4844810962677002 seconds.
    

    The dictionary setup is in my book, insanely quicker. Although that most likely has to do with multiple for loops in the named tuple setup, and that can probably be easily remedied some how. But for basic understanding this is a big difference. The example obviously doesn't test larger one-time-creations or access times. Just, "what if you use these options to create data-sets over a period of time, how much time would you loose" :)

    Bonus: What if you have a large base dictionary, and want to freeze it?

    base_dict = {'x'+str(i) : i for i in range(1000000)}
    
    start = time.time()
    nt = foo(base_dict)
    end = time.time()
    print('Named tuples:', end - start,'seconds.')
    
    start = time.time()
    x = autodict(base_dict)
    end = time.time()
    print('Autodict:', end - start,'seconds.')
    

    Well, the difference was bigger than I expected.. x1038.5 times faster.
    (I was using the CPU for other stuff, but I think this is fair game)

    Named tuples: 154.0662612915039 seconds.
    Autodict: 0.1483476161956787 seconds.