We use GitLab for our terraform lock files. And we commonly get errors acquiring the lock and such. It's clearly a capacity issue on their end. If we rerun the job it almost always passes.
So I want to setup something that looks at the log, and if it sees that specific error text, reruns the job. Looking at their docs and such I don't see that as a possibility. But I can't imagine others haven't implemented something to do this.
While GitLab does not offer log message detection as a retry feature, it is possible to build this solution using the features that are available, including retry:exit_codes:
-- or even just do this retry handling entirely in your script.
The short version is this: you use your script logic to catch the specific error message and, when found, you either exit with a specific status code (which you will specify in retry:exit_codes:
) or simply retry the script or command directly in the job script logic.
There's obviously a lot of ways to handle this, one specific implementation in bash (or similar shell) may look like this:
retry:exit_codes:
list in your job YAMLset -e
set (or use set +e
)set -e
if desired)myjob:
variables:
# sometimes required for correct exit code detection with some executors
# uncomment if your script always exits with exit code 1
# FF_USE_NEW_BASH_EVAL_STRATEGY: "1"
retry:
max: 4
exit_codes:
- 42
script:
- ./myscript.sh
This is untested code, but should explain the basic idea:
#!/usr/bin/env bash
# myscript.sh
set +e
set -o pipefail
err_msg="Error locking state"
special_code=42
terraform apply 2>&1 | tee -a output.txt
exit_code=$?
if [[ $exit_code -ne 0 ]]; then
if grep -q "$err_msg" < output.txt; then
exit $special_code
else
exit $exit_code
fi
fi
set -e
# ...
Alternatively, just handle the retry logic entirely in bash, for example, as described in the Unix stackexchange question: Re-running a command in case of a specific error message.