Search code examples
distributed-computingnumerical-methodsfinite-element-analysis

Implementing domain decomposition algos as solvers vs preconditioners


Question

In learning parallel domain decomposition (DD) concepts, and subsequently interested in their implementation, the terms domain decomposition as a solver vs. as a preconditioner come up quite often, see below from "Numerical Solution of Partial Differential Equations on Parallel Computers" (Bruaset 2006):

Since the Schwarz methods introduced in Chapters 1 and 2 represent fixed-point iterations applied to preconditioned global problems, and consequently do not provide the fastest convergence possible, it is natural to apply Krylov methods instead. This provides the justification for using Schwarz methods as preconditioners rather than solvers.

As solvers, at least the parallel Schwarz method as a DD is quite clear, see below image from this talk, since one just solves in parallel on each domain and then exchanges boundary data. But how are parallel DD methods used as preconditioners for say the conjugate gradient method? And wouldn't global synchronizations as part of applying the preconditioner at each step in the preconditioned conjugate gradient algorithm be a major computational bottleneck?

enter image description here

Attempt

I think the idea is that you solve locally on each subdomain, and you then reduce-sum these solutions to get the preconditioner (in this case, the preconditioner M_inverse can be saved) for each step of the conjugate gradient algorithm acting on the global problem. However, this seems to be odd because in more complicated algorithms, e.g., balancing domain decomposition by constraints (BDDC), one would have to globally synchronize at each step of the conjugate gradient algorithm. Lastly, in solving the local problems, you could actually use a preconditioner for these linear solvers, so you end up with a sort of nested preconditioning algorithm (see end for pseudocode that perhaps clarifies this).

Definition of restricted additive Schwarz (RAS) method from "An Introduction to Domain Decomposition Methods Algorithms, Theory, and Parallel Implementation" (Dolean 2016):

enter image description here

Pseudocode for RAS as a preconditioner to a preconditioned conjugate gradient (PCG) method controlled by the master process:

def RAS(A_global, b_global, subdomain_indices):
  # Listing 3.2 from "An Introduction to Domain Decomposition Methods Algorithms, 
  # Theory, and Parallel Implementation" (Dolean 2016)

  # Get subdomain info
  subdomain_id = get_partition_id() # using MPI
  A_subdomain = A_global[subdomain_indices[subdomain_id]]
  b_subdomain = b_global[subdomain_indices[subdomain_id]]

  # local solve can ALSO have a preconditioner
  jacobi_preconditioner_subdomain = diagonal(A_subdomain)
  x_subdomain = linear_solver(
    A_subdomain, b_subdomain, preconditioner=jacobi_preconditioner_subdomain)

  # add the solutions on each subdomain 
  M_inverse = reduce_sum(x_subdomain) # using MPI, this is a global sync step

  return M_inverse


def preconditioned_conjugate_gradient(
  A_global, b_global, x0, subdomain_indices, n_iters):
  # Convergence/iteration checks of pcg occurs on the master process, 
  # and its preconditioner is the RAS
  # Algorithm 11.2 from "Scientific Computing An Introductory Survey" (Heath 2018)

  # initial residual
  rk = b_global - A_global @ x0

  # get the preconditioner via RAS and save this so that it may 
  # be used at each step of the pcg algo, SAVING M_INVERSE
  # IS NOT ALWAYS POSSIBLE AND APPLICATION OF PRECONDITIONER AT EACH
  # STEP IS MORE GENERAL ALGO!
  M_inverse = RAS(A_global, b_global, subdomain_indices)

  # APPLICATION OF PRECONDITIONER ON RESIDUAL!
  sk = M_inverse @ rk

  # cg iterations
  for k in range(n_iters)
    alpha_k = (rk.T @ sk)/(sk.T @ A @ sk)
    
    # x_{k+1}
    xk += alpha_k*sk

    # r_{k+1}
    rkp1 = rk - alpha_k*A @ sk

    betakp1 = (rkp1.T @ M_inverse @ rkp1)/(rk.T @ M_inverse @ rk)

    # APPLY PRECONDITIONER!!! s_{k+1}
    sk = M_inverse @ rkp1 + betakp1*sk 

    # update rk
    rk = rkp1

It is worth noting that M_inverse is generally not computed directly in practice, but rather the application (aka action, see "Acceleration of a parallel BDDC solver by using graphics processing units on subdomains" Sistek 2023) of the preconditioner M on the residual r is computed. The PCG method from "Numerical Linear Algebra with Julia" (Darve 2021) provided as pseudocode below more accurately illustrates this. See also a python BDDC implementation and corresponding Python PCG.

def pcg(A, b, maxiter, preconditioner = BDDC):
  # PCG pseudo code with preconditioner application from
  # pg. 275, 277, and 320 from Darve 2021
  n = A.shape[0]
  x = zeros(n)
  r = b
  p = zeros(n)
  rho = 1.0
  for i in range(1, maxiter+1)
    # THE PRECONDITIONER! In BDDC case, this returns the sum
    # of different correction operations such that 
    # z = v1 + v2 + v3 where the components of the sum use
    # the residual r 
    z = preconditioner.apply(r)
    ...
    ...

  return x 

Solution

  • Based on reading "A Preconditioner for Substructuring Based on Constrained Energy Minimization" (Dohrman 2006) and an implementation of "A method of Finite Element Tearing and Interconnecting (FETI) and its parallel solution algorithm" (Farhat 1991) available through AMfeti, I think the use of domain decomposition methods as preconditioners vs. solvers essentially comes down to the fact that after splitting up a domain into different pieces (known as a substructures), you solve the system on that subdomain, and then ensure that solutions at the boundaries of the subdomain don't "conflict" somehow.

    Getting the solutions at the subdomain boundaries not to "conflict" is how the domain decomposition methods differ, see the introduction in "A Non-overlapping domain decomposition method for solving elliptic problems by finite element methods" (Feng 1996). As an example, FETI uses Lagrange multipliers to ensure continuity (resolve the "conflict") of solutions at the interface. At least with the example of FETI (Farhat1991) and BDDC (Dohrman2006), FETI is introduced explicitly as a solver while BDDC is introduced explicitly as a preconditioner. So in the case of BDDC, a preconditioner M for which the operation ldiv(result, M, x) (equivalent to inv(M)*x or M \ x) is defined and called on each subdomain, and then a Krylov subspace method (e.g., preconditioned conjugate gradient) is applied to solve local problems and resolve interface conflicts accordingly.

    Note that the matrix M is never explicitly formed since the inverse inv(M) is expensive to compute. For an easy example of what defining M \ x looks like as opposed to explicitly computing inv(M), see the diagonal preconditioner from Preconditioners.jl. In that example, instead of computing the inverse of a diagonal matrix D, instead the definition of the inverse of a diagonal matrix is used (since the inverse is just the reciprocal of the diagonal entries), hence

    julia> using LinearAlgebra
    julia> A = rand(2,2)
    julia> x = rand(2)
    julia> D_vec = diag(A)
    julia> D_mat = Diagonal(D_vec) 
    julia> # you could put D_vec into a struct then define ldiv! also
    julia> @assert all(
      isapprox.(inv(D_mat)*x, [x[i] / D_vec[i] for i in 1:length(x)])) 
    

    So in essence, a domain decomposition method can be a solver itself OR the domain decomposition could create a preconditioner on each subdomain to be used in conjunction with an iterative solver (e.g., preconditioned conjugated gradient). I graphically summarize the use of domain decomposition methods as preconditioners for each subdomain for which separate iterative solvers then generates the solutions u_{i} and the boundary conflicts are subsequently resolved. Resolving the boundary conflict enforces a global solution on the entire domain, and this process continues until some convergence criteria is met.

    enter image description here

    One last verification of the fact that the preconditioner is different on each subdomain (which in a distributed setting, might each exist on separate compute nodes) is corroborated by "PETSc for Partial Differential Equations Numerical Solutions in C and Python" (Buehler 2021), I include this just to show a concrete verification of my claims in this answer. The excerpt below has bjacobi for block jacobi and asm for additive Schwarz method, respectively:

    enter image description here