While trying to find the most efficient way to filter dictionaries, I stumbled upon a strange behaviour.
I made 4 tests, the first one filters the dictionary sequentially.
The second one does a single filtering with a combination of the rules (which is actually the most efficient way).
Then, I made attempt to make it more generic so that the filter may be used with an arbitrary number of predicates, that could be eventually user-defined, not hard-coded.
And I realised that combining predicated with all
was much more inefficient than doing two filterings one after the other.
What can explain this ? Is this the all()
function that has poor performance ?
Would you suggest any other way to improve performance still in a generic way ?
# TEST 1 (Took 1.717677354812622s)
y = {k:x for k,x in y.items() if x['id'] >= 3 }
y = {k:x for k,x in y.items() if x['name'].find('a') != -1 }
# TEST 2 (Took 1.411365032196045s)
y = {k:x for k,x in y.items() if x['id'] >= 3 and x['name'].find('a') != -1 }
# TEST 3 (Took 3.4738941192626953s)
predicates = [
lambda x: x['id'] >= 3,
lambda x: x['name'].find('a') != -1
]
y = {k:x for k,x in y.items() if all([f(x) for f in predicates]) }
# TEST 4 (Took 2.4156315326690674s)
predicates = [
lambda x: x['id'] >= 3,
]
y = {k:x for k,x in y.items() if predicates[0](x) }
predicates = [
lambda x: x['name'].find('a') != -1
]
y = {k:x for k,x in y.items() if predicates[0](x) }
Testing bench :
from var_dump import var_dump
import time
start_time = time.time()
for p in range(0,1000000):
users = {
1: {'id': 1, 'name': "toto"},
2: {'id': 2, 'name': "titi"},
3: {'id': 3, 'name': "tata"},
4: {'id': 4, 'name': "tutu"},
5: {'id': 5, 'name': "john"},
6: {'id': 6, 'name': "jane"}
}
y = users
#-> test goes here
print(y)
print("--- %s seconds ---" % (time.time() - start_time))
all
evaluates both conditions, whereas and
in TEST_2 is evaluated lazily which means the second condition is evaluated for ids >= 3 only.
Illustrating by simple print statements:
users = {
1: {'id': 1, 'name': "toto"},
2: {'id': 2, 'name': "titi"},
3: {'id': 3, 'name': "tata"},
4: {'id': 4, 'name': "tutu"},
5: {'id': 5, 'name': "john"},
6: {'id': 6, 'name': "jane"}
}
def check_1(item):
print("check_1")
return item['id'] >= 3
def check_2(item):
print("check_2")
return item['name'].find('a') != -1
# all condition
print("TEST AND")
for k, v in users.items():
check_1(v) and check_2(v)
# all condition
print("TEST ALL")
for k, v in users.items():
all([check_1(v), check_2(v)])
Out:
TEST AND
check_1
check_1
check_1
check_1
check_2
check_1
check_2
check_1
check_2
TEST ALL
check_1
check_2
check_1
check_2
check_1
check_2
check_1
check_2
check_1
check_2
check_1
check_2