I have this code:
s = set([5,6,7,8])
if key in s:
return True
if key not in s:
return False
It seems to me that it shouldn't, in theory, differ time wise, but I may be missing something under the hood.
Is there any reason to prefer one over the other in terms of processing time or readability?
Perhaps is this an example of:
"Premature optimization is the root of all evil"?
OK, I ran this test:
import random
s = set([5,6,7,8])
for _ in range(5000000):
s.add(random.randint(-100000,100000000))
def test_in():
count = 0
for _ in range(50000):
if random.randint(-100000,100000000) in s:
count += 1
print(count)
def test_not_in():
count = 0
for _ in range(50000):
if random.randint(-100000,100000000) not in s:
count += 1
print(count)
When I time the outputs:
%timeit test_in()
10 loops, best of 3: 83.4 ms per loop
%timeit test_not_in()
10 loops, best of 3: 78.7 ms per loop
BUT, that small difference seems to be a symptom of counting the components. There are an average of 47500 "not ins" but only 2500 "ins". If I change both tests to pass, e.g.:
def test_in():
for _ in range(50000):
if random.randint(-100000,100000000) in s:
pass
The results are nearly identical
%timeit test_in()
10 loops, best of 3: 77.4 ms per loop
%timeit test_not_in()
10 loops, best of 3: 78.7 ms per loop
In this case, my intuition failed me. I had thought that saying it is not in the set
could have had added some additional processing time. When I further consider what a hashmap does, it seems obvious that this can't be the case.
Running a simple performance test in an ipython session with timeit
confirms g.d.d.c's statement.
def one(k, s):
if k in s:
return True
def two(k, s):
if k not in s:
return False
s = set(range(1, 100))
%timeit -r7 -n 10000000 one(50, s)
## 83.7 ns ± 0.874 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
%timeit -r7 -n 10000000 two(50, s)
## 86.1 ns ± 1.11 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
Optimisations such as this aren't going to gain you a lot, and as has been pointed out in the comments will in fact reduce the speed at which you'll push out bugfixes/improvements/... due to bad readability. For this type of low-level performance gains, I'd suggest looking into Cython or Numba.