bash google-apps-script parallel-processing background-process clasp

Run multiple Google Apps Script clasp commands in parallel using a Bash script

I have several hundred Google Apps Script projects and have a variety of Bash scripts for managing the projects using the clasp tool (a Node.js app). Many of the scripts require using clasp pull to first pull the projects locally before taking some actions on the local files, so I have a script which loops through local clasp project folders and runs clasp pull on each. The loop iterates through directories sequentially so if it takes 3-4 seconds to pull a project, it ends up taking 5-6 minutes to run it per 100 projects.

My goal is to be able to run the clasp pull commands in parallel so that they all start at the same time, and to be able to know which projects were successfully pulled vs which projects failed to be pulled.

Given a directory structure like this:

├── project-1
│   ├── .clasp.json
│   ├── .claspignore
│   ├── _main.js
│   └── appsscript.json
├── project-2
│   ├── .clasp.json
│   ├── .claspignore
│   ├── _main.js
│   └── appsscript.json
├── project-3
│   ├── .clasp.json
│   ├── .claspignore
│   ├── _main.js
│   └── appsscript.json
└── pull_all.sh

And this pull_all.sh Bash script:

#!/bin/bash

# use Node 14.17.5 to prevent "Error: Looks like you are offline." errors
# (see https://github.com/google/clasp/issues/872)
[ -s "/usr/local/opt/nvm/nvm.sh" ] && . "/usr/local/opt/nvm/nvm.sh"
nvm install 14.17.5
nvm use 14.17.5

find . -name '.clasp.json' | 
while read file; do
    (
        cd "$(dirname "$file")"
        project_dir_name="$(basename "$(pwd)")"
        echo "Pulling project ($project_dir_name)"
        clasp pull
    ) &
done

When running this script it outputs the line for "Pulling project" for each directory, then gives a shell prompt, implying that the script has finished executing. But then without the user doing anything, 3-4 seconds later it shows the output of all the clasp pull commands (apparently running in parallel because some of the output of the commands are out of order/overlapping), then hangs, and does not give a new shell prompt. At this point I have to press ctrl+c to terminate the script.

The complete output ends up looking like this:

$ ./pull_all.sh
v14.17.5 is already installed.
Now using node v14.17.5 (npm v6.14.14)
Now using node v14.17.5 (npm v6.14.14)
Pulling project (project-3)
Pulling project (project-2)
Pulling project (project-1)
$
Cloned 2 files.
⠙ Pulling files…└─ appsscript.json
└─ _main.js
Cloned 2 files.
└─ _main.js
└─ appsscript.json
Cloned 2 files.
└─ _main.js

To force one of the scripts to fail, I can change the scriptId to an invalid script ID in any of the .clasp.json files. In this case I do see the expected output of:

Could not find script.
Did you provide the correct scriptId?
Are you logged in to the correct account with the script?

... but it's still mixed in with the rest of the output and it's not clear which project that came from.

How can I make it so that:

The script does not cause a new shell prompt to appear during the execution of the script.
The script outputs a line indicating the success or failure of each clasp pull operation, referenced by the directory name of the project (where the .clasp.json file was found).
Bonus: suppress the output of clasp pull so the script only shows the success or failure result of each project (referenced by the directory name).

Note: I've mentioned clasp pull as an example command, but a valid solution would allow me to run any clasp command as a background process in a bash while loop, including, but not limited to clasp push, clasp deploy, etc.

Solution

I'd suggest the following solution:

#!/usr/bin/env bash

# use Node 14.17.5 to prevent "Error: Looks like you are offline." errors
# (see https://github.com/google/clasp/issues/872)
[ -s "/usr/local/opt/nvm/nvm.sh" ] && . "/usr/local/opt/nvm/nvm.sh"
nvm install 14.17.5
nvm use 14.17.5

# Check and process command line
if (( $# < 1 )); then
    echo "Usage: $(basename "$0") ACTION [ARG]..."
    exit 2
fi
action="$1"
args=("${@:2}")

# Define cleanup handler, create temporary log directory
trap '[[ -n "$(jobs -p)" ]] && kill -- -$$; [[ -n "${logdir}" ]] && rm -rf "${logdir}"' EXIT
logdir=$(mktemp -d)

# Start specified action for each project
declare -A pid_pro_map=() pid_log_map=()
readarray -t files < <(find . -name '.clasp.json' -printf "%P\n" | sort -V)
for file in "${files[@]}"; do
    project=$(dirname "${file}")
    logfile=$(mktemp -p "${logdir}")
    ( cd "${project}" && clasp "${action}" "${args[@]}" ) &>"${logfile}" &
    pid=$!; pid_pro_map[${pid}]="${project}"; pid_log_map[${pid}]="${logfile}"
    echo -e "Started action '\e[1m${action}\e[0m' for project '\e[1m${project}\e[0m' (pid ${pid})"
done

# Wait for background jobs to finish and report results
echo -e "\nWaiting for background jobs to finish...\n"
jobs_done=0; jobs_total=${#files[@]}
while true; do
    wait -n -p pid; result=$?
    [[ -z "${pid}" ]] && break
    jobs_done=$((jobs_done + 1))
    if (( ${result} == 0 )); then
        echo -e "Action '\e[1m${action}\e[0m' for project '\e[1m${pid_pro_map[${pid}]}\e[0m' (pid ${pid}) (${jobs_done}/${jobs_total}): \e[1;32mSUCCESS\e[0m"
    else
        echo -e "Action '\e[1m${action}\e[0m' for project '\e[1m${pid_pro_map[${pid}]}\e[0m' (pid ${pid}) (${jobs_done}/${jobs_total}): \e[1;31mFAILURE\e[0m"
        cat "${pid_log_map[${pid}]}"
    fi
done

Features:

Allows to run any action supported by clasp (e.g. pull, push, deploy)
Performs the specified action for each project in parallel in the background
Output produced by clasp is suppressed (but captured to be printed in case of failure)
Waits for background tasks to finish and reports results as soon as they become available
Provides information regarding success/failure for each project (including output produced by clasp for further analysis in case of failure)
Displays current progress (in the form of <projects-done>/<projects-total>)
Colored output for increased readability

Requirements:

Bash >= 5.1 (details: Bash >= 5.1 for wait -p, Bash >= 4.3 for wait -n, Bash >= 4.0 for associative arrays)

GNU find (part of findutils) for find ... -printf "%P\n"; Possible workaround:

readarray -t files < <(find . -name '.clasp.json' | sort -V)
for file in "${files[@]}"; do
    project=$(dirname "${file#'./'}")

Sample output:

In response to this comment, here is a possible tweak to limit the amount of concurrent background jobs being spawned:

# Start specified action for each project
max_jobs=25; poll_delay="0.1s"
declare -A pid_pro_map=() pid_log_map=()
readarray -t files < <(find . -name '.clasp.json' -printf "%P\n" | sort -V)
for file in "${files[@]}"; do
    if (( ${max_jobs} > 0 )); then
        while jobs=$(jobs -r -p | wc -l) && (( ${jobs} >= ${max_jobs} )); do
            sleep "${poll_delay}"
        done
    fi
    project=$(dirname "${file}")
    logfile=$(mktemp -p "${logdir}")
    ( cd "${project}" && clasp "${action}" "${args[@]}" ) &>"${logfile}" &
    pid=$!; pid_pro_map[${pid}]="${project}"; pid_log_map[${pid}]="${logfile}"
    echo -e "Started action '\e[1m${action}\e[0m' for project '\e[1m${project}\e[0m' (pid ${pid})"
done

Additionally, this could be employed to cut the amount of background processes being spawned in half:

( cd "${project}" && exec clasp "${action}" "${args[@]}" ) &>"${logfile}" &

This will replace the subshell's process with clasp, which should be perfectly fine as the subshell looses its usefulness right after executing cd anyway.