Applying Reduce and Broadcast in MPI

I'm designing a Python program to compute the number of primes between k < n < k^2 for some fixed value of n. My original implementation computes the number of primes by splitting up the calculation by the number of processors.

My intention is to apply the reduce and broadcast commands to parallelize the work between all processors. I'm new to MPI and don't know how to do this, so I have commented out these two lines in code. I'm curious if there is a specific direction that I should follow to complete the reduce and broadcast commands in MPI. My original code is

import numpy as np
import platform
import sys
from mpi4py import MPI

comm = MPI.COMM_WORLD

id = comm.Get_rank( )

p = comm.Get_size( )
p=1

# Find the primes between 2 and k. Initialize k.
k=10
# Define a list S_k of the primes between 2 and k
primes=[]
# Define a list to store numbers that aren't prime between 2 and k.
not_prime = []
# Define a list S_k2 of the primes between k and k**2
primes2=[]


# Count the number of primes from 2 to k
for i in range(2, k+1):
    if i not in not_prime:
        primes.append(i)
        for j in range(i*i, k+1, i):
            not_prime.append(j)


# Find the number of primes between k and k**2
b=(k**2-k)/p
for n in range(int(k+(p-1)*b),int(k+(p)*b)):
    counter = 0
    for i in range(len(primes)):
        if (n % primes[i]) == 0:
            break
        else:
            counter = counter + 1
    if (counter==len(S_k)):
        primes2.append(n)

# I'm not sure what to use as parameters for comm.Reduce and comm.bcast
# comm.reduce = (primes2, op = MPI.SUM, root = 0 )
# comm.bcast |= (primes2, op = MPI.SUM, root = 0 )

print ("Number of processors: ",p)
print (primes2)
print((int(k+(p-1)*b),int(k+(p)*b)))

When testing against p=1, the code returns

Number of processors:  1
[11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]
(10, 100)

When testing against p=2, the code returns

Number of processors:  2
[59, 61, 67, 71, 73, 79, 83, 89, 97]
(55, 100)

When testing against p=3, the code returns

Number of processors:  3
[71, 73, 79, 83, 89, 97]
(70, 100)

and as p increases, the amount of prime elements in primes2 decreases.

Ideally, I am investigating how to parallelize the code so that testing against p=3 returns

Number of processors:  3
Processor 1 computed [11, 13, 17, 19, 23, 29, 31, 37]
Processor 2 computed [41, 43, 47, 53, 59, 61, 67]
Processor 3 computed [71, 73, 79, 83, 89, 97]

I think that this can be done through applying the reduce and broadcast commands for MPI. I'm not sure how I should adjust

# comm.reduce = (primes2, op = MPI.SUM, root = 0 )
# comm.bcast |= (primes2, op = MPI.SUM, root = 0 )

so that the individual processors compute a subset of the primes.

Solution

#mpiexec -n 3 python n_loop.py

import numpy as np
import platform
import sys
from mpi4py import MPI

comm = MPI.COMM_WORLD

id = comm.Get_rank( )

p = comm.Get_size( )

# Find the primes between 2 and k. Initialize k.
k=20
# Define a list S_k of the primes between 2 and k
S_k=[]
# Define a list to store numbers that aren't prime between 2 and k.
not_prime = []
# Define a list S_k2 of the primes between k and k**2
S_k2=[]


# Count the number of primes from 2 to k
for i in range(2, k+1):
    if i not in not_prime:
        S_k.append(i)
        for j in range(i*i, k+1, i):
            not_prime.append(j)


# Find the number of primes between k and k**2 by
# pararllelizing the n-loop.
b=(k**2-k)/p
for n in range(int(k+(id)*b),int(k+(id+1)*b)):
    counter = 0
    for i in range(len(S_k)):
        if (n % S_k[i]) == 0:
            break
        else:
            counter = counter + 1
    if (counter==len(S_k)):
        S_k2.append(n)

# Compute the amount of primes in the two lists.
processor_num_primes = len(S_k2)
original_num_primes = len(S_k)

# Broadcast the amount of primes and calculate the unions
# of sets of integers.
countb = 0
totalb = 0
for i in range(0,p):
    countb = comm.bcast(S_k2,i)
    totalb = totalb + len(countb)
total = comm.reduce(totalb,MPI.BOR,0)


if (id == 0):
    print ("Total number of processors: ",p)
print ("Processor ",id, "is calculating:", S_k2)
print ("Processor ",id, "is calculating this many primes :",processor_num_primes)
print("Processor ",id,"has the following range:",(int(k+(id)*b),int(k+(id+1)*b)))
if (id == 0):
    print("The total number of primes found between",2,"and",k*k,"by the n-"
    "loop is:",total+original_num_primes)
print('')