python multithreading generator networkx

How to consume a python gneerator in parallel using multiprocessing?

How can I improve the performance of the networkx function local_bridges https://networkx.org/documentation/stable//reference/algorithms/generated/networkx.algorithms.bridges.local_bridges.html#networkx.algorithms.bridges.local_bridges

I have experimented using pypy - but so far am still stuck on consuming the generator on a single core. My graph has 300k edges. An example:

# construct the nx Graph:
import networkx as nx
# construct an undirected graph here - this is just a dummy graph
G = nx.cycle_graph(300000)

# fast - as it only returns an generator/iterator
lb = nx.local_bridges(G)

# individual item is also fast
%%time
next(lb)
CPU times: user 1.01 s, sys: 11 ms, total: 1.02 s
Wall time: 1.02 s

# computing all the values is very slow.
lb_list = list(lb)

How can I consume this iterator in parallel to utilize all processor cores? The current naive implementation is only using a single core!

My naive multi-threaded first try is:

import multiprocessing as mp
lb = nx.local_bridges(G)
pool = mp.Pool()
lb_list = list(pool.map((), lb))

However, I do not want to apply a specific function - () rather only get the next element from the iterator in parallel.

edit

I guess it boils down how to parallelize:

lb_res = []
lb = nx.local_bridges(G)
for node in range(1, len(G) +1):
    lb_res.append(next(lb))
    
lb_res

Naively using multiprocessing obviously fails:

# from multiprocessing import Pool
# https://stackoverflow.com/questions/41385708/multiprocessing-example-giving-attributeerror
from multiprocess import Pool
lb_res = []
lb = nx.local_bridges(G)

def my_function(thing):
    return next(thing)

with Pool(5) as p:
    parallel_result = p.map(my_function, range(1, len(G) +1))
    
parallel_result

But it is unclear to me how I can pass my generator as the argument to the map function - and fully consume the generator.

edit 2

For this particular problem, it turns out that the bottleneck is the shortest path computation for the with_span=True parameter. When disabled, it is decently fast.

When calculating the span is desired I would suggest cugraph with a fast implementation of SSSP on the GPU. Still, the iteration over the set of edges does not happen in parallel and should be improved further.

However, to learn more, I would be interested in understanding how to parallelize the consumption from a generator in python.

Solution

You can't consume a generator in parallel, every non-trivial generator's next state is determined by its current state. You have to call next() sequentially.

From https://github.com/networkx/networkx/blob/master/networkx/algorithms/bridges.py#L162 this is how the function is implemented

for u, v in G.edges:
    if not (set(G[u]) & set(G[v])):
        yield u, v

So you could parallelize it using something like this, but then you would have to incur the penalty of merging those individual lists using something like multiprocessing.Manager. I think it would just make the whole thing much slower, but you can time it yourself.

def process_edge(e):
    u, v = e
    lb_list = []
    if not (set(G[u]) & set(G[v])):
        lb_list.append((u,v))
with Pool(os.cpu_count()) as pool:
    pool.map(process_edge, G.edges)

Another way is to split the graph into ranges of vertices and process them concurrently.

def process_nodes(nodes):
    lb_list = []
    for u in nodes:
        for v in G[u]:
            if not (set(G[u]) & set(G[v])):
                lb_list.append((u,v))

with Pool(os.cpu_count()) as pool:
    pool.map(process_nodes, np.array_split(list(range(G.number_of_nodes())), 
os.cpu_count()))

Maybe you could also check if any better algorithms exist for this problem. Or find a faster library that's implemented in C.