In numpy, the square root and power by 1/2 is almost indistinguishable in speed. However, when doing the inverse square root vs the power by -1/2, the latter is about 10x slower.
# Python 3.10.2; numpy 1.22.1; clang-1205.0.22.11; macOS 12.1
import numpy as np
arr = np.random.uniform(0, 1, 10000)
print("Square Root")
%timeit -n 10000 np.sqrt(arr)
%timeit -n 10000 arr**(1/2)
print("Inverse Square Root")
%timeit -n 10000 1 / np.sqrt(arr)
%timeit -n 10000 np.sqrt(1/arr)
%timeit -n 10000 arr**(-1/2)
Square Root
10.3 µs ± 315 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
10.9 µs ± 321 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
Inverse Square Root
19.1 µs ± 744 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
19.4 µs ± 791 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
196 µs ± 5.69 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
Can someone with more familiarity of the source implementation explain the difference?
numpy
makes special cases of exponents in {-1, 0, 0.5, 1, 2}
, but no others. There was an issue opened in 2017 to add to add -2 and -0.5 to this set, but it appears that nothing yet has been done in that direction.