I need to check if a string is hexadecimal. I learnt 2 approaches -
1.) Looping over each character
all(c in string.hexdigits for c in s) # Straight forward with no optimizations
2.) Use int() function to check for an error
try:
int(s, 16)
return True
except ValueError:
return False
In the 1st case I know the complexity is O(n). But what about the 2nd one? What is the time complexity there?
int(s, 16)
will still have complexity O(n), where n == len(s)
, but the two aren't directly comparable. int
will be iterating over the data at a lower level than all
, which is faster, but int
also does more work (it actually has to compute the integer value of s
).
So which is faster? You have to profile both.
In [1]: s = "783c"
In [2]: import string
In [3]: %timeit all(c in string.hexdigits for c in s)
800 ns ± 3.23 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [4]: %%timeit
...: try:
...: int(s, 16)
...: except ValueError:
...: pass
...:
223 ns ± 1.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
Looks like internal iteration wins. I tested on an 9-digit string as well, and int
was still about 4x faster.
But what about invalid strings?
In [8]: s = 'g'
In [9]: %%timeit
...: try:
...: int(s, 16)
...: except ValueError:
...: pass
...:
1.09 µs ± 2.62 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [10]: %timeit all(c in string.hexdigits for c in s)
580 ns ± 6.55 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
Now, we're basically testing the benefit of short-circuiting versus the cost of catching an exception. What happens if the error occurs later in the string?
In [11]: s = "738ab89ffg"
In [12]: %timeit all(c in string.hexdigits for c in s)
1.59 µs ± 19.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [13]: %%timeit
...: try:
...: int(s, 16)
...: except ValueError:
...: pass
...:
1.25 µs ± 19.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
Now we're seeing the benefit to internal iteration again.