I'm using petsc as a solver for my project. However, the solver in parallel mode creates much more process then my expectation.
The code using python and petsc4py. The machine have 4 cores. (a). If I run it directly, petsc uses only 1 process to assemble the matrix, and creates 4 process to solve the equations, (b). If I use comment 'mpirun -n 4', petsc uses 4 process to assemble the matrix, but creates 16 process to solve the equations,
I have checked my own python code,, the main component associates with matrix create is as follow:
m = PETSc.Mat().create(comm=PETSc.COMM_WORLD)
m.setSizes(((None, n_vnode[0]*3), (None, n_fnode[0]*3)))
m.setType('dense')
m.setFromOptions()
m.setUp()
m_start, m_end = m.getOwnershipRange()
for i0 in range(m_start, m_end):
delta_xi = fnodes - vnodes[i0//3]
temp1 = delta_xi ** 2
delta_2 = np.square(delta) # delta_2 = e^2
delta_r2 = temp1.sum(axis=1) + delta_2 # delta_r2 = r^2+e^2
delta_r3 = delta_r2 * np.sqrt(delta_r2) # delta_r3 = (r^2+e^2)^1.5
temp2 = (delta_r2 + delta_2) / delta_r3 # temp2 = (r^2+2*e^2)/(r^2+e^2)^1.5
if i0 % 3 == 0: # x axis
m[i0, 0::3] = ( temp2 + np.square(delta_xi[:, 0]) / delta_r3 ) / (8 * np.pi) # Mxx
m[i0, 1::3] = delta_xi[:, 0] * delta_xi[:, 1] / delta_r3 / (8 * np.pi) # Mxy
m[i0, 2::3] = delta_xi[:, 0] * delta_xi[:, 2] / delta_r3 / (8 * np.pi) # Mxz
elif i0 % 3 == 1: # y axis
m[i0, 0::3] = delta_xi[:, 0] * delta_xi[:, 1] / delta_r3 / (8 * np.pi) # Mxy
m[i0, 1::3] = ( temp2 + np.square(delta_xi[:, 1]) / delta_r3 ) / (8 * np.pi) # Myy
m[i0, 2::3] = delta_xi[:, 1] * delta_xi[:, 2] / delta_r3 / (8 * np.pi) # Myz
else: # z axis
m[i0, 0::3] = delta_xi[:, 0] * delta_xi[:, 2] / delta_r3 / (8 * np.pi) # Mxz
m[i0, 1::3] = delta_xi[:, 1] * delta_xi[:, 2] / delta_r3 / (8 * np.pi) # Myz
m[i0, 2::3] = ( temp2 + np.square(delta_xi[:, 2]) / delta_r3 ) / (8 * np.pi) # Mzz
m.assemble()
the main component associates to petsc solver is as follow:
ksp = PETSc.KSP()
ksp.create(comm=PETSc.COMM_WORLD)
ksp.setType(solve_method)
ksp.getPC().setType(precondition_method)
ksp.setOperators(self._M_petsc)
ksp.setFromOptions()
ksp.solve(velocity_petsc, force_petsc)
Is there any one could give me some suggestion? Thanks.
set an environment variable OMP_NUM_THREADS=1.