Julia Block Banded Matrices construction from arrays - Improve Speed

I am trying to construct and solve a matrix system that looks like this:

\begin{equation}
    \begin{bmatrix}
        B & D & 0 & \cdots &        & 0       \\
        A & B & D & \ddots &        & \vdots  \\
        0 & A & B & D      & \ddots &         \\
        \vdots & \ddots &   & \ddots & \\
        \vdots & \cdots & \cdots & A & B & D \\
        0      & \cdots & \cdots & 0 & A & B
    \end{bmatrix}
    \
    \begin{bmatrix}
       \Delta c
    \end{bmatrix}
    \
    =
    \
    \begin{bmatrix}
       G
    \end{bmatrix}
\end{equation}

This system comes from a 1-dimensional finite volume method for solving a system of equations (PDE's).

The large matrix (BBM) is constructed from the arrays A, B, D. The following code functions properly, but I am interested in making this code more computationally efficient.

I am curious if the stack overflow community has any suggestions. In particular is there a method to eliminate the for loop within the function?

using BlockBandedMatrices

function calculate_BBM(; A, B, D)
    # initialize the Block Banded Matrix (BBM)
    int_array = map(Int, N*ones(NJ));
    BBM = BlockBandedMatrix(Zeros{Float64}(N*NJ, N*NJ), int_array, int_array, (1,1))

    # Precompute indices to avoid calculating them in each iteration
    ind_begin = [(1 + N * (j - 1)) for j in 1:NJ]
    ind_end = [N * j for j in 1:NJ]

    for j in 2:NJ - 1
        # Use precomputed indices
        ib = ind_begin[j]
        ie = ind_end[j]

        BBM[ib:ie, (ib - N):(ie - N)] .= A[:, :, j - 1]
        BBM[ib:ie, ib:ie] .= B[:, :, j]
        BBM[ib:ie, (ib + N):(ie + N)] .= D[:, :, j]
    end

    j = 1
    ib = ind_begin[j]
    ie = ind_end[j]

    BBM[ib:ie, ib:ie] .= B[:, :, j]
    BBM[ib:ie, (ib + N):(ie + N)] .= D[:, :, j]


    j = NJ
    ib = ind_begin[j]
    ie = ind_end[j]

    BBM[ib:ie, (ib - N):(ie - N)] .= A[:, :, j - 1]
    BBM[ib:ie, ib:ie] .= B[:, :, j]

    return BBM

end

N = 3
NJ = 102

A = randn(N, N, NJ-1)
B = randn(N, N, NJ)
D = randn(N, N, NJ-1)

# A = -ones(N, N, NJ-1)
# B = ones(N, N, NJ) * 2
# D = ones(N, N, NJ-1)

BBM = calculate_BBM(A=A, B=B, D=D);

Solution

Using the Block(...) indexing method explained in BlockArrays.jl, the function can initially be simplified to look much nicer (and 2x faster):

# initialize the Block Banded Matrix (BBM)
function calculate_BBM(NJ; A, B, D)
    N = size(A,1)
    NJ_Ns = fill(N, NJ)
    BBM = BlockBandedMatrix(Zeros{Float64}(N*NJ, N*NJ), 
      NJ_Ns, NJ_Ns, (1,1))
           
    @views BBM[Block(1,1)] .= B[:, :, 1]
    @views BBM[Block(2,1)] .= D[:, :, 1]
           
    for j in 2:NJ - 1
        @views BBM[Block(j-1,j)] .= A[:, :, j - 1]
        @views BBM[Block(j,j)] .= B[:, :, j]
        @views BBM[Block(j+1,j)] .= D[:, :, j]
    end

    @views BBM[Block(NJ-1,NJ)] .= A[:, :, NJ - 1]
    @views BBM[Block(NJ,NJ)] .= B[:, :, NJ]

    return BBM
end

Also note the function now takes NJ as a parameter and gets N from the size of an input block.

ADDED: improved version (without NJ parameter):

# initialize the Block Banded Matrix (BBM)
function calculate_BBM(; A, B, D)
    N,NJ = size(A,1), size(B,3)
    size(A) == size(D) == (N,N,NJ-1) || 
      error("input matrix sizes clash")
    size(B) == (N,N,NJ) || error("input matrix sizes clash")
    NJ_Ns = fill(N, NJ)
    BBM = BlockBandedMatrix(Zeros{Float64}(N*NJ, N*NJ), NJ_Ns, NJ_Ns, (1,1))

    for (i, M) in enumerate(eachslice(B; dims=3))
        BBM[Block(i,i)] .= M
    end
    for (i, (S, T)) in enumerate(zip(eachslice(D; dims=3), 
                                     eachslice(A; dims=3)))
        BBM[Block(i+1,i)] .= S
        BBM[Block(i,i+1)] .= T
    end
    return BBM
end

UPDATE: Fixed 2nd solution to use Zeros.