Assemble eigen3 sparsematrix from smaller sparsematrices

I am assembling the jacobian of a coupled multi physics system. The jacobian consists of a blockmatrix on the diagonal for each system and off diagonal blocks for the coupling. I find it the best to assemble to block separatly and then sum over them with projection matrices to get the complete jacobian. pseudo-code (where J[i] are the diagonal elements and C[ij] the couplings, P are the projections to the complete matrix).

// diagonal blocks
J.setZero();
for(int i=0;i<N;++i){
    J+=P[i]J[i]P[i].transpose()
}
// off diagonal elements
for(int i=0;i<N;++i){
    for(int j=i+1;j<N;++j){
        J+=P[i]C[ij]P[j].transpose()
        J+=P[j]C[ji]P[i].transpose()
    }
}

This takes a lot performance, around 20% of the whole programm, which is too much for some assembling. I have to recalculate the jacobian every time step since the system is highly nonlinear. Valgrind indicates that the ressource consuming method is Eigen::internal::assign_sparse_to_sparse and in this method the call to Eigen::SparseMatrix<>::InsertBackByOuterInner.

Is there a more efficient way to assemble such a matrix?

(I also had to use P*(JP.transpose()) instead of PJ*J.transpose() to make the programm compile, may be there is already something wrong)

P.S: NDEBUG and optimizations are turned on

Edit: by storing P.transpose in a extra matrix ,I get a bit better performance, but the summation accounts still for 15% of the programm

Solution

Your code will be much faster by working inplace. First, estimate the number of non-zeros per column in the final matrix and reserve space (if not already done):

int nnz_per_col = ...;
J.reserve(VectorXi::Constant(n_cols, nnz_per_col));

If the number of nnz per column is highly non-uniform, then you can also compute it per column:

VectorXi nnz_per_col(n_cols);
for each j
  nnz_per_col(j) = ...;
J.reserve(nnz_per_col);

Then manually insert elements:

for each block B[k]
  for each elements i,j
    J.coeffRef(foo(i),foo(j)) += B[k](i,j)

where foo implement the appropriate mapping of indices.

And for the next iteration, no need to reserve, but you need to set coefficient values to zero while preserving the structure:

J.coeffs().setZero();