Why is numpy inverse square root "x**(-1/2)" so much slower than "1/np.sqrt(x)" and "np.sqrt(1/x)"

In numpy, the square root and power by 1/2 is almost indistinguishable in speed. However, when doing the inverse square root vs the power by -1/2, the latter is about 10x slower.

# Python 3.10.2; numpy 1.22.1; clang-1205.0.22.11; macOS 12.1
import numpy as np

arr = np.random.uniform(0, 1, 10000)

print("Square Root")
%timeit -n 10000 np.sqrt(arr)
%timeit -n 10000 arr**(1/2)

print("Inverse Square Root")
%timeit -n 10000 1 / np.sqrt(arr)
%timeit -n 10000 np.sqrt(1/arr)
%timeit -n 10000 arr**(-1/2)

Square Root
10.3 µs ± 315 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
10.9 µs ± 321 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

Inverse Square Root
19.1 µs ± 744 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
19.4 µs ± 791 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
196 µs ± 5.69 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

Can someone with more familiarity of the source implementation explain the difference?

Solution

numpy makes special cases of exponents in {-1, 0, 0.5, 1, 2}, but no others. There was an issue opened in 2017 to add to add -2 and -0.5 to this set, but it appears that nothing yet has been done in that direction.