Script to generate Markdown files with embedded PlantUML diagrams for GitLab's PlantUML renderer

I am setting up a repository to store software documentation consisting of several documents which are written in Markdown, and I want to be able to embed PlantUML diagrams in them. The repository is hosted in Gitlab, which includes a PlantUML renderer but does not allow preprocessing and therefore using the !include clause to reference diagrams in other files.

I would like to have a bash or python script that:

Searches all .md files and append their content one after the other in a new file "all-docs.md".
Searches in that file "all-docs.md" for all the !include [FILEPATH] clauses and replace the content which is between @startuml and @enduml from that file [FILEPATH] into "all-docs.md".

For example:

"all-docs.md" contains in certain part:

Here is the Profile class diagram:

``plantuml
@startuml
!include ./data-models/profile.puml
Profile o-- UidObject
@enduml
``

And profile.puml content is:

@startuml
class Profile <UidObject> {
    + string name
    + string email
    + string phone
    + Date birthDate
}
@enduml

The result after the script will be to have in "all-docs.md":

Here is the Profile class diagram:

``plantuml
@startuml
class Profile <UidObject> {
    + string name
    + string email
    + string phone
    + Date birthDate
}
Profile o-- UidObject
@enduml
``

The repo has the following structure.

/
├── assets/
├── docs/
├── uml/

The assets/ directory contains various assets such as images, icons, and other resources.
The docs/ directory contains the documents (markdown files)
The uml/ directory contains contains PlantUML source files that are used to generate diagrams for the software documentation.

Solution

A bash and find solution with your given input/files, something like:

#!/usr/bin/env bash

#: Find and concatenate all .md files from docs/ directory in all-docs.md file.
find docs/ -type f -name '*.md' -exec sh -c 'cat -- "$@" >> all-docs.md' sh {} +

#: Parse the all-docs.md file and 
#: create/print the desired result/output.
while IFS= read -ru "$fd" line; do
  if [[ $line == "!include"* ]]; then
    temp=${line#!include *}
    mapfile -t plum < "$temp" &&
    unset -v 'plum[-1]' &&
    printf '%s\n' "${plum[@]:1}"
  else
    printf '%s\n' "$line"
  fi
done {fd}< all-docs.md

Replace the last line of the code to

done {fd}< all-docs.md > tempfile && mv tempfile all-docs.md

If you're satisfied with the output and permanent changes needs to be made for all-docs.md.

Or just parse the output of find directly without creating the all-docs.md, something like:

#!/usr/bin/env bash

##: Find all the files ending in .md in the docs/ directory
##: conCATenate all the contents of the files in question.
find docs/ -type f -name '*.md' -exec sh -c 'cat -- "$@"' sh {} + | {
  while IFS= read -r line; do
    [[ $line != "!include"* ]] && { ##: If line does not have the pattern !include.
      printf '%s\n' "$line"         ##: Print the line as is.
    }
    [[ $line == "!include"* ]] && { ##: If line has the pattern !include parse it.
      temp=${line#!include *}       ##: Extract FILE_PATH in a variable named temp.
      if [[ -s "$temp" ]]; then     ##: If variable is an existing non-empty file.
        mapfile -u3 -t plum 3< "$temp" && ##: Extract the desired result.
        unset -v 'plum[-1]' &&
        printf '%s\n' "${plum[@]:1}"
      else
        printf '%s\n' "$line"  ##: Otherwise just print the line as is.
      fi
    }
  done
}

If creating the all-docs.md is a must/requirement, then change the last line to:

} > all-docs.md

Both mapfile aka readarray and {fd} requires bash v4+