Search code examples
pythonpython-3.xdictionaryavrokeyword-argument

python's **kwargs efficiency


Is it fine to build python3 flow like this?

def foo(**kwargs):
    kwargs['kw'] = 1
    return bar(**kwargs, wk=2)
def bar(**kwargs):
    process(1,2,'sss',**kwargs)
    for i in kwargs:
        print(i)
...etc...

Is kwargs going to single mutable object (dict) and only it's reference will be pass down the flow or will i be unpacked and created over and over?

More precise question. If i do this:

def a(**kwargs):
    return b(**kwargs)
def b(**kwargs):
    return c(**kwargs)
...
def z(**kwargs):
    print(**kwargs)

will there only be 1 dict at a time? And if so will there be new object created with each call? or will i stack them?

The actual case is that I am one of sub services that communicates with AVRO. So I have a package that turns that binary into a dict, then i need to do something and create a new AVRO.

Some fields are not present in new schema, some are added, some are just passing without touching them.

So i just took that first dict, pass it over and over add more and more data then at the end i have another schema and avro package can take such huge dict and serialize only what is defined in the schema.

Is that approach ok?


Solution

  • A new dictionary is built for each **kwargs parameter in each function. That's because the call **kwargs syntax is distinct from the syntax in a function signature

    • Using **kwargs in call causes a dictionary to be unpacked into separate keyword arguments.
    • Using **kwargs as a catch-all parameter causes a dictionary to be produced from keyword arguments being passed in.

    Even if this wasn't the case, Python can't optimise by sharing the dictionary. Function foo() calling bar() would have to handle the possibility that a dictionary kwargs passed into a call could be mutated.

    So, no, using **kwargs in a series of connected functions is not going to give you more efficiency.

    A quick demo to show that the dictionaries passed into a series of functions are distinct:

    >>> def a(**kwargs):
    ...     print('a', id(kwargs))
    ...     b(**kwargs)
    ...
    >>> def b(**kwargs):
    ...     print('b', id(kwargs))
    ...
    >>> a(foo='bar')
    a 4556474552
    b 4557517304
    

    If the dictionaries were shared, their id() value would also be the same.

    If you want to pass along shared state between functions, do so explicitly. Pass the dictionary along directly, for example:

    def foo(**state):
        state['kw'] = 1
        state['wk'] = 2
        return bar(state)  # kwargs passed in as a single argument
    
    def bar(state):
        # etc.