pythonnumpyperformancenumba# Why is Jitted Numba function slower than original function?

### Update: Possible computer specific size dependence

I've written a function to create uniformly spaced points on a disk and since it's run quite often and on relatively large array I figured the application of `numba`

would increase the speed significantly. However, upon running a quick test I've found that the `numba`

function is more than twice as slow.

Is there a way to figure out what is slowing down the `numba`

function?

Here's the function:

```
@njit(cache=True)
def generate_points_turbo(centre_point, radius, num_rings, x_axis=np.array([-1, 0, 0]), y_axis=np.array([0, 1, 0])):
"""
Generate uniformly spaced points inside a circle
Based on algorithm from:
http://www.holoborodko.com/pavel/2015/07/23/generating-equidistant-points-on-unit-disk/
Parameters
----------
centre_point : np.ndarray (1, 3)
radius : float/int
num_rings : int
x_axis : np.ndarray
y_axis : np.ndarray
Returns
-------
points : np.ndarray (n, 3)
"""
if num_rings > 0:
delta_R = 1 / num_rings
ring_radii = np.linspace(delta_R, 1, int(num_rings)) * radius
k = np.arange(num_rings) + 1
points_per_ring = np.rint(np.pi / np.arcsin(1 / (2*k))).astype(np.int32)
num_points = points_per_ring.sum() + 1
ring_indices = np.zeros(int(num_rings)+1)
ring_indices[1:] = points_per_ring.cumsum()
ring_indices += 1
points = np.zeros((num_points, 3))
points[0, :] = centre_point
for indx in range(len(ring_radii)):
theta = np.linspace(0, 2 * np.pi, points_per_ring[indx]+1)
points[ring_indices[indx]:ring_indices[indx+1], :] = ((ring_radii[indx] * np.cos(theta[1:]) * x_axis[:, None]).T
+ (ring_radii[indx] * np.sin(theta[1:]) * y_axis[:, None]).T)
return points + centre_point
```

And it's called like this:

```
centre_point = np.array([0,0,0])
radius = 1
num_rings = 15
generate_points_turbo(centre_point, radius, num_rings )
```

Would be great if someone knows why the function is slower when `numba`

compiled or how to go about finding out what the bottleneck for the `numba`

function is.

It seems the `numba`

function is working, but the cross-over between where it's faster and slower maybe be hardware specific.

```
%timeit generate_points(centre_point, 1, 2)
99.5 µs ± 932 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
%timeit generate_points_turbo(centre_point, 1, 2)
213 µs ± 8.4 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
%timeit generate_points(centre_point, 1, 20)
647 µs ± 11.2 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
%timeit generate_points_turbo(centre_point, 1, 20)
314 µs ± 8.74 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
%timeit generate_points(centre_point, 1, 200)
11.9 ms ± 375 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit generate_points_turbo(centre_point, 1, 200)
7.9 ms ± 243 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
```

After about 12-15 rings the `numba`

function (*_turbo) starts to become a similar speed or faster on my machine, but the performance gains at larger size are smaller than expected. But seems like it is actually working, just that some part of the function is heavily size dependent.

Solution

I got rid of all your transpositions / newaxis / 3D stuff that you were not using and got a x20 boost compared to your original solution. I replaced a `range`

with a `prange`

too for good measure, as you don't care in which order your points are calculated.

```
# Imports.
import matplotlib.pyplot as plt
from numba import njit, prange
import numpy as np
# "Turbo" function.
@njit(cache=True)
def generate_points_turbo(centre_point, radius, num_rings):
"""
Generate uniformly spaced points inside a circle
Based on algorithm from:
http://www.holoborodko.com/pavel/2015/07/23/generating-equidistant-points-on-unit-disk/
Parameters
----------
centre_point : np.ndarray (2,)
radius : float/int
num_rings : int
x_axis : np.ndarray
y_axis : np.ndarray
Returns
-------
points : np.ndarray (n, 2)
"""
if not num_rings > 0:
return
delta_R = 1 / num_rings
ring_radii = np.linspace(delta_R, 1, num_rings) # Use a unit circle that we will scale only at the end.
k = np.arange(num_rings) + 1
points_per_ring = np.rint(np.pi / np.arcsin(1 / (2*k))).astype(np.int32)
num_points = points_per_ring.sum() + 1
points = np.zeros((num_points, 2))
n = 1 # n == 0 is the central point by design.
for ring_number in prange(len(ring_radii)):
r = ring_radii[ring_number] # The local radius between 1/num_rings and 1.
points_on_this_ring = points_per_ring[ring_number]
theta = np.linspace(0, 2 * np.pi, points_on_this_ring)
points[n: n+points_on_this_ring, 0] = r * np.cos(theta)
points[n: n+points_on_this_ring, 1] = r * np.sin(theta)
n += points_on_this_ring
return points * radius + centre_point
# Test that the result is accurate.
if __name__ == "__main__":
centre_point = np.array([0, 0])
radius = 3.14159
num_rings = 10
p = generate_points_turbo(centre_point, radius, num_rings)
fig, ax = plt.subplots()
ax.set_aspect(1)
ax.scatter(*p.T)
fig.show()
```

```
# Test time taken.
>>> from timeit import timeit
>>> from initial_code import generate_points_turbo as generate_points_turbo_stackoverflow
>>> %timeit generate_points_turbo(centre_point, radius, num_rings)
>>> 13.5 µs ± 21.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
>>> %timeit generate_points_turbo_stackoverflow(np.array([0, 0, 0]), radius, num_rings)
>>> 261 µs ± 1.03 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
```

- Python Jinja2 LaTeX Table
- Getting attributes of a class
- How can I print many significant figures in Python?
- How to allow list append() method to return the new list
- Calculate Last Friday of Month in Pandas
- Python type hint for Iterable[str] that isn't str
- How to iterate over a list in chunks
- How to exit the entire application from a Python thread?
- Running shell command and capturing the output
- How do I pass a variable by reference?
- Convert range(r) to list of strings of length 2 in python
- How can I get the start and end dates for each week?
- how to use send_message() in python-telegram-bot
- Python conditional replacement based on element type
- How can I count the number of items in an arbitrary iterable (such as a generator)?
- Find longest consecutive range of numbers in list
- Insert text in braces with asyncpg
- How does one put a link / url to the web-site's home page in Django?
- How to determine if a path is a subdirectory of another?
- Custom Keybindings for Ipython terminal
- FastAPI asynchronous background tasks blocks other requests?
- How to make sure that information from one file is duplicated into several text documents, without specific lines
- Installing a Python environment with Anaconda
- sklearn pipeline model predicting same results for all input
- Brew command not found after installing Anaconda Python
- How to get an XPath from selenium webelement or from lxml?
- Pipe PuTTY console to Python script
- How to align the axes of a figure in matplotlib?
- Persist ParentDocumentRetriever of langchain
- How to reset index in a pandas dataframe?