I am trying to create a helper function which invokes another function multiple times. For the helper function, I want variables to be passed in as **kwargs so as to allow the main function to determine the default values of each parameter.
The arguments passed in can be variable length iterables and will be joined together into multiple dictionaries. Here is an example of what the input and parsed form should be:
{'param1': ['arg1'], 'param2': ['arg1', 'arg2', 'arg3'], 'param3': ['arg1', 'arg2']}
#=>
[{'param1': 'arg1', 'param2': 'arg1', 'param3': 'arg1'}, {'param2': 'arg2', 'param3': 'arg2'}, {'param2': 'arg3'}]
Is there any built-in function for python that allows you to flatten a dictionary in this way? I want to preserve the key-value pairings as they will be used as keyword arguments when invoking the main function.
First, I tried to avoid passing **kwargs into the main function by converting the arguments into lists and then passing them into itertools.zip_longest()
.
for data, param1, param2, param3 in itertools.zip_longest(external_data, argv1, argv2, argv3):
foo(data, param1, param2, param3) # Invoke main function
However, this forces using None
or some other filling value and shadows the defaults defined by the main function.
Second, I used a nested list comprehension to parse **kwargs and create a list of dictionaries similar to what I described above.
foo = [{k: v[idx]
for k, v in kwargs.items() if idx < len(v) and v[idx] is not escape}
for idx in range(len(longest_argument_list))]
However, this forced me to iterate over all the kwargs.values()
to get the length of the longest argument list before parsing **kwargs.
Ideally, there is a simpler way to flatten **kwargs into multiple dictionaries using a built-in function. If not, there may be a built-in that has better performance than the nested list comprehension method.
It would be nice, but not necessary, to allow some form of sentinel value to signal the need to skip over a specific function invocation's argument (e.g. passing in param1=['arg1', None, 'arg3']
to allow the second invocation of main to use the default value for param1
).
import collections
import inspect
def invoked_function(param, param1=None, param2='', param3='.'):
"""This function only prints its own call, but it would
perform some actions using param and **kwargs"""
variables = inspect.currentframe().f_locals
function = inspect.currentframe().f_code.co_name
output = f'{function}(**{variables})'
print(output)
def helper_function(**kwargs):
external_data = ['target1', 'target2', 'target3', 'target4']
longest_argument_list = max(kwargs.values(), key=len)
escape = None
foo = [{k: v[idx]
for k, v in kwargs.items() if idx < len(v) and v[idx] is not escape}
for idx in range(len(longest_argument_list))]
foo = collections.deque(foo)
for target in external_data:
kwargs = foo.popleft() if foo else {}
invoked_function(target, **kwargs)
if __name__ == '__main__':
helper_function(param1=['arg1'],
param2=['arg1', 'arg2', 'arg3'],
param3=['arg1', 'arg2'])
The above script works as is.
After thinking over the problem more, I realised that the original program structure could be improved by separating the flattening function from the helper function.
Using Andrej Kesely's Answer as inspiration on how to use zip_longest
, I came up with this solution:
def generate_flattened_kwargs(**kwargs):
keyword_argument_mappings = map(zip,
itertools.cycle([kwargs]),
itertools.zip_longest(*kwargs.values()))
for keyword_arguments in list(keyword_argument_mappings):
flat_kwargs = dict(keyword_arguments)
yield flat_kwargs
The most notable drawback is that kwargs with the itertools.zip_longest
set fillvalue
are not filtered out.
In exchange, this implementation is faster than the dict expression construction.
After doing some more thinking on this problem (and needing a function that supports sentinel values) I wrote this up:
def generate_flattened_kwargs_with_sentinel(sentinel=None, **kwargs):
arguments = itertools.zip_longest(*kwargs.values(), fillvalue=sentinel)
arguments, sentinel_filter = itertools.tee(arguments)
keyword_argument_pairs = map(zip,
itertools.cycle([kwargs]),
arguments)
filtered_pairs = map(itertools.compress,
keyword_argument_pairs,
sentinel_filter)
for kwargs in filtered_pairs:
yield dict(kwargs)
It is almost as fast as the above function which does not have sentinel values. However, it gives up readability and code flow in exchange for the sentinel values. Additionally sentinel values must be Falsy (e.g 0, None, [], {}, '', etc.)
Here are some numbers I got on my system using the timeit module:
Running tests for many_parameters_few_arguments():
It took 1.3088s to complete andrew_sentinel_function.
It took 1.2698s to complete andrew_nosentinel_function.
It took 2.1734s to complete showcase_function.
It took 1.5139s to complete Andrej_Kesely_function.
Running tests for few_parameters_many_arguments():
It took 0.6311s to complete andrew_sentinel_function.
It took 0.6316s to complete andrew_nosentinel_function.
It took 1.0176s to complete showcase_function.
It took 0.7964s to complete Andrej_Kesely_function.
Unfortunately, the system I was using to test didn't have enough ram for many_parameters_many_arguments()
.
import timeit
import random as r
import itertools
def andrew_sentinel_function(sentinel=None, **kwargs):
arguments = itertools.zip_longest(*kwargs.values(), fillvalue=sentinel)
arguments, sentinel_filter = itertools.tee(arguments)
keyword_argument_pairs = map(zip,
itertools.cycle([kwargs]),
arguments)
filtered_pairs = map(itertools.compress,
keyword_argument_pairs,
sentinel_filter)
return [dict(kwargs) for kwargs in filtered_pairs]
def andrew_nosentinel_function(**kwargs):
keyword_argument_mappings = map(zip,
itertools.cycle([kwargs]),
itertools.zip_longest(*kwargs.values()))
return [dict(keyword_arguments)
for keyword_arguments in list(keyword_argument_mappings)]
def showcase_function(**kwargs):
longest_argument_list = max(kwargs.values(), key=len)
escape = None
return [{k: v[idx]
for k, v in kwargs.items() if idx < len(v) and v[idx] is not escape}
for idx in range(len(longest_argument_list))]
def Andrej_Kesely_function(**kwargs):
return [{param: value for param, value in zip(kwargs, t1) if not value is None}
for t1 in itertools.zip_longest(*kwargs.values(), fillvalue=None)]
def few_parameters_many_arguments():
r.seed(42)
n_parameters = 100
n_arguments = 100000
parameters = [str(i) for i in range(n_parameters)]
arguments = [[r.randrange(100) for _ in range(n_arguments)]
for _ in range(n_parameters)]
return dict(zip(parameters, arguments))
def many_parameters_few_arguments():
r.seed(42)
n_parameters = 100000
n_arguments = 100
parameters = [str(i) for i in range(n_parameters)]
arguments = [[r.randrange(100) for _ in range(n_arguments)] for _ in range(n_parameters)]
return dict(zip(parameters, arguments))
def many_parameters_many_arguments():
r.seed(42)
n_parameters = 100000
n_arguments = 100000
parameters = [str(i) for i in range(n_parameters)]
arguments = [[r.randrange(100) for _ in range(n_arguments)] for _ in
range(n_parameters)]
return dict(zip(parameters, arguments))
if __name__ == '__main__':
functions = [
andrew_sentinel_function,
andrew_nosentinel_function,
showcase_function,
Andrej_Kesely_function
]
setups = [
'kwargs = many_parameters_few_arguments()',
'kwargs = few_parameters_many_arguments()',
'kwargs = many_parameters_many_arguments()'
]
for setup in setups:
print('')
print(f"Running tests for {setup.split(' = ')[1]}:")
for function in functions:
time = timeit.timeit(f'function(**kwargs)', setup, globals=globals(), number=1)
print(f'It took {time:.4f}s to complete {function.__name__}.')
for setup in setups:
print('')
print(f"Running tests for {setup.split(' = ')[1]}:")
for function in functions:
time = timeit.timeit(f'function(**kwargs)', setup, globals=globals(), number=1)
print(f'It took {time:.4f}s to complete {function.__name__}.')
The difference most likely lies in list()
and dict()
as they are implemented in C while dict and list expressions are in Python byte code.
In effect, the difference is fairly minimal and will not make much of an impact unless you are processing large amounts of data. I ended up using the generator implementation as it led to better code reusability in my project.