How can we efficiently check if a string is hexadecimal in Python

I need to check if a string is hexadecimal. I learnt 2 approaches -

1.) Looping over each character

all(c in string.hexdigits for c in s) # Straight forward with no optimizations

2.) Use int() function to check for an error

try:
    int(s, 16)
    return True
except ValueError:
    return False

In the 1st case I know the complexity is O(n). But what about the 2nd one? What is the time complexity there?

Solution

int(s, 16) will still have complexity O(n), where n == len(s), but the two aren't directly comparable. int will be iterating over the data at a lower level than all, which is faster, but int also does more work (it actually has to compute the integer value of s).

So which is faster? You have to profile both.

In [1]: s = "783c"

In [2]: import string

In [3]: %timeit all(c in string.hexdigits for c in s)
800 ns ± 3.23 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [4]: %%timeit
   ...: try:
   ...:   int(s, 16)
   ...: except ValueError:
   ...:   pass
   ...:
223 ns ± 1.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Looks like internal iteration wins. I tested on an 9-digit string as well, and int was still about 4x faster.

But what about invalid strings?

In [8]: s = 'g'

In [9]: %%timeit
   ...: try:
   ...:   int(s, 16)
   ...: except ValueError:
   ...:   pass
   ...:
1.09 µs ± 2.62 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [10]: %timeit all(c in string.hexdigits for c in s)
580 ns ± 6.55 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Now, we're basically testing the benefit of short-circuiting versus the cost of catching an exception. What happens if the error occurs later in the string?

In [11]: s = "738ab89ffg"

In [12]: %timeit all(c in string.hexdigits for c in s)
1.59 µs ± 19.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [13]: %%timeit
    ...: try:
    ...:   int(s, 16)
    ...: except ValueError:
    ...:   pass
    ...:
1.25 µs ± 19.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Now we're seeing the benefit to internal iteration again.