pythonparallel-processinganacondasparse-matrixnumba# How to parallelize this Python for loop when using Numba

**Question:**

## Given

the above attempt to use

I'm using the Anaconda distribution of Python, together with Numba, and I've written the following Python function that multiplies a sparse matrix ** A** (stored in a CSR format) by a dense vector

`x`

```
@jit
def csrMult( x, Adata, Aindices, Aindptr, Ashape ):
numRowsA = Ashape[0]
Ax = numpy.zeros( numRowsA )
for i in range( numRowsA ):
Ax_i = 0.0
for dataIdx in range( Aindptr[i], Aindptr[i+1] ):
j = Aindices[dataIdx]
Ax_i += Adata[dataIdx] * x[j]
Ax[i] = Ax_i
return Ax
```

Here ** A** is a large

`scipy`

sparse matrix,```
>>> A.shape
( 56469, 39279 )
# having ~ 142,258,302 nonzero entries (so about 6.4% )
>>> type( A[0,0] )
dtype( 'float32' )
```

and ** x** is a

`numpy`

array. Here is a snippet of code that calls the above function:```
x = numpy.random.randn( A.shape[1] )
Ax = A.dot( x )
AxCheck = csrMult( x, A.data, A.indices, A.indptr, A.shape )
```

Notice the ** @jit**-decorator that tells Numba to do a just-in-time compilation for the

`csrMult()`

In my experiments, my function `csrMult()`

is about **twice as fast** as the `scipy`

** .dot()** method. That is a pretty impressive result for Numba.

However, MATLAB still performs this matrix-vector multiplication about **6 times faster** than `csrMult()`

. I believe that is because MATLAB uses multithreading when performing sparse matrix-vector multiplication.

How can I parallelize the outer `for`

-loop when using Numba?

Numba used to have a ** prange()** function, that made it simple to parallelize embarassingly parallel

`for`

`prange()`

[`for`

-loop now, that Numba's `prange()`

function is gone?When `prange()`

was removed from Numba, what alternative did the developers of Numba have in mind?

Edit 1:

I updated to the latest version of Numba, which is .35, and`prange()`

is back! It was not included in version .33, the version I had been using.

That is good news, but unfortunately I am getting an error message when I attempt to parallelize my for loop using`prange()`

. Here is a parallel for loop example from the Numba documentation (see section 1.9.2 "Explicit Parallel Loops"), and below is my new code:

```
from numba import njit, prange
@njit( parallel=True )
def csrMult_numba( x, Adata, Aindices, Aindptr, Ashape):
numRowsA = Ashape[0]
Ax = np.zeros( numRowsA )
for i in prange( numRowsA ):
Ax_i = 0.0
for dataIdx in range( Aindptr[i],Aindptr[i+1] ):
j = Aindices[dataIdx]
Ax_i += Adata[dataIdx] * x[j]
Ax[i] = Ax_i
return Ax
```

When I call this function, using the code snippet given above, I receive the following error:

AttributeError: Failed at nopython (convert to parfors) 'SetItem' object has no attribute 'get_targets'

the above attempt to use

`prange`

crashes, my question stands:**What is the correct way** ( using `prange`

or an alternative method ) **to parallelize this Python for-loop?**

As noted below, it was trivial to parallelize a similar for loop in C++ and obtain an **8x** speedup, having been run on **20**-omp-threads. There must be a way to do it using Numba, since the for loop is embarrassingly parallel (and since sparse matrix-vector multiplication is a fundamental operation in scientific computing).

Edit 2:

Here is my C++ version of`csrMult()`

. Parallelizing the`for()`

loop in the C++ version makes the code about 8x faster in my tests. This suggests to me that a similar speedup should be possible for the Python version when using Numba.

```
void csrMult(VectorXd& Ax, VectorXd& x, vector<double>& Adata, vector<int>& Aindices, vector<int>& Aindptr)
{
// This code assumes that the size of Ax is numRowsA.
#pragma omp parallel num_threads(20)
{
#pragma omp for schedule(dynamic,590)
for (int i = 0; i < Ax.size(); i++)
{
double Ax_i = 0.0;
for (int dataIdx = Aindptr[i]; dataIdx < Aindptr[i + 1]; dataIdx++)
{
Ax_i += Adata[dataIdx] * x[Aindices[dataIdx]];
}
Ax[i] = Ax_i;
}
}
}
```

Solution

Numba has been updated and `prange()`

works now!_{(I'm answering my own question.)}

The improvements to Numba's parallel computing capabilities are discussed in this blog post, dated December 12, 2017. Here is a relevant snippet from the blog:

Long ago (more than 20 releases!), Numba used to have support for an idiom to write parallel for loops called

`prange()`

. After a major refactoring of the code base in 2014, this feature had to be removed, but it has been one of the most frequently requested Numba features since that time. After the Intel developers parallelized array expressions, they realized that bringing back`prange`

would be fairly easy

Using Numba version 0.36.1, I can parallelize my embarrassingly parallel `for`

-loop using the following simple code:

```
@numba.jit(nopython=True, parallel=True)
def csrMult_parallel(x,Adata,Aindices,Aindptr,Ashape):
numRowsA = Ashape[0]
Ax = np.zeros(numRowsA)
for i in numba.prange(numRowsA):
Ax_i = 0.0
for dataIdx in range(Aindptr[i],Aindptr[i+1]):
j = Aindices[dataIdx]
Ax_i += Adata[dataIdx]*x[j]
Ax[i] = Ax_i
return Ax
```

In my experiments, parallelizing the `for`

-loop made the function execute about eight times faster than the version I posted at the beginning of my question, which was already using Numba, but which was not parallelized. Moreover, in my experiments the parallelized version is about 5x faster than the command `Ax = A.dot(x)`

which uses scipy's sparse matrix-vector multiplication function. **Numba has crushed scipy** and I finally have a python sparse matrix-vector multiplication routine that is **as fast as MATLAB**.

- Python Jinja2 LaTeX Table
- Getting attributes of a class
- How can I print many significant figures in Python?
- How to allow list append() method to return the new list
- Calculate Last Friday of Month in Pandas
- Python type hint for Iterable[str] that isn't str
- How to iterate over a list in chunks
- How to exit the entire application from a Python thread?
- Running shell command and capturing the output
- How do I pass a variable by reference?
- Convert range(r) to list of strings of length 2 in python
- How can I get the start and end dates for each week?
- how to use send_message() in python-telegram-bot
- Python conditional replacement based on element type
- How can I count the number of items in an arbitrary iterable (such as a generator)?
- Find longest consecutive range of numbers in list
- Insert text in braces with asyncpg
- How does one put a link / url to the web-site's home page in Django?
- How to determine if a path is a subdirectory of another?
- Custom Keybindings for Ipython terminal
- FastAPI asynchronous background tasks blocks other requests?
- How to make sure that information from one file is duplicated into several text documents, without specific lines
- Installing a Python environment with Anaconda
- sklearn pipeline model predicting same results for all input
- Brew command not found after installing Anaconda Python
- How to get an XPath from selenium webelement or from lxml?
- Pipe PuTTY console to Python script
- How to align the axes of a figure in matplotlib?
- Persist ParentDocumentRetriever of langchain
- How to reset index in a pandas dataframe?