Search code examples
pythonbashmarkdownplantuml

Script to generate Markdown files with embedded PlantUML diagrams for GitLab's PlantUML renderer


I am setting up a repository to store software documentation consisting of several documents which are written in Markdown, and I want to be able to embed PlantUML diagrams in them. The repository is hosted in Gitlab, which includes a PlantUML renderer but does not allow preprocessing and therefore using the !include clause to reference diagrams in other files.

I would like to have a bash or python script that:

  1. Searches all .md files and append their content one after the other in a new file "all-docs.md".
  2. Searches in that file "all-docs.md" for all the !include [FILEPATH] clauses and replace the content which is between @startuml and @enduml from that file [FILEPATH] into "all-docs.md".

For example:

"all-docs.md" contains in certain part:

Here is the Profile class diagram:

``plantuml
@startuml
!include ./data-models/profile.puml
Profile o-- UidObject
@enduml
``

And profile.puml content is:

@startuml
class Profile <UidObject> {
    + string name
    + string email
    + string phone
    + Date birthDate
}
@enduml

The result after the script will be to have in "all-docs.md":

Here is the Profile class diagram:

``plantuml
@startuml
class Profile <UidObject> {
    + string name
    + string email
    + string phone
    + Date birthDate
}
Profile o-- UidObject
@enduml
``

The repo has the following structure.

/
├── assets/
├── docs/
├── uml/
  • The assets/ directory contains various assets such as images, icons, and other resources.
  • The docs/ directory contains the documents (markdown files)
  • The uml/ directory contains contains PlantUML source files that are used to generate diagrams for the software documentation.

Solution

  • A bash and find solution with your given input/files, something like:

    #!/usr/bin/env bash
    
    #: Find and concatenate all .md files from docs/ directory in all-docs.md file.
    find docs/ -type f -name '*.md' -exec sh -c 'cat -- "$@" >> all-docs.md' sh {} +
    
    #: Parse the all-docs.md file and 
    #: create/print the desired result/output.
    while IFS= read -ru "$fd" line; do
      if [[ $line == "!include"* ]]; then
        temp=${line#!include *}
        mapfile -t plum < "$temp" &&
        unset -v 'plum[-1]' &&
        printf '%s\n' "${plum[@]:1}"
      else
        printf '%s\n' "$line"
      fi
    done {fd}< all-docs.md
    

    • Replace the last line of the code to
    done {fd}< all-docs.md > tempfile && mv tempfile all-docs.md
    
    • If you're satisfied with the output and permanent changes needs to be made for all-docs.md.

    Or just parse the output of find directly without creating the all-docs.md, something like:

    #!/usr/bin/env bash
    
    ##: Find all the files ending in .md in the docs/ directory
    ##: conCATenate all the contents of the files in question.
    find docs/ -type f -name '*.md' -exec sh -c 'cat -- "$@"' sh {} + | {
      while IFS= read -r line; do
        [[ $line != "!include"* ]] && { ##: If line does not have the pattern !include.
          printf '%s\n' "$line"         ##: Print the line as is.
        }
        [[ $line == "!include"* ]] && { ##: If line has the pattern !include parse it.
          temp=${line#!include *}       ##: Extract FILE_PATH in a variable named temp.
          if [[ -s "$temp" ]]; then     ##: If variable is an existing non-empty file.
            mapfile -u3 -t plum 3< "$temp" && ##: Extract the desired result.
            unset -v 'plum[-1]' &&
            printf '%s\n' "${plum[@]:1}"
          else
            printf '%s\n' "$line"  ##: Otherwise just print the line as is.
          fi
        }
      done
    }
    

    If creating the all-docs.md is a must/requirement, then change the last line to:

    } > all-docs.md
    

    • Both mapfile aka readarray and {fd} requires bash v4+