Search code examples
pythondockergoogle-cloud-platformloggingcron

How to Make Batch Jobs Logs Available When the Jobs Run Inside Ephemeral Docker Containers?


Context

So, basically I am running a cron job (python ETL script) via a docker container. That means, every day at 12.30 am my cron job runs

docker run $IMAGE 

In the Dockerfile I have the script like

# Run the script at container boot time.
CMD ["./run_manager.sh"]

This is how the run_manager.sh looks like.

python3 main.py>>main.log 2>&1

I am using the python logging module like this

#!/usr/bin/env python3
# encoding: utf-8

"""
This file contains the script
"""
import logging
from contextlib import AbstractContextManager
import polars as pl
import tensorflow as tf
import sqlalchemy as sa

logging.basicConfig(format='%(asctime)s|%(levelname)s: %(message)s',
                    datefmt='%H:%M:%S, %d-%b-%Y', level=logging.INFO)

...
# Other codes

Question

Since the container is an ephemeral one that is created and destroyed every day as the cron is triggered, I have no way to access the log. So how do we change it to make the logs persist, rotate and are visible outside the container? Is there a way?

Addendum

Right now it is running as a cron on an on-prem Ubuntu instance. But I am going to migrate it to google cloud scheduler very soon, keeping the design intact as much as possible. Is there any solution in that case as well, basically, to be able to see the logs of past jobs?


Solution

  • In a container you usually don't log to a file. Since the container has an isolated filesystem, it can be tricky to extract the log file. The more common setup is to have the container log to stdout.

    With what you've shown, the logging module already logs to stdout, so you just need to remove the redirection in your wrapper script. If that's the only thing the wrapper script does, you don't even need that; you can remove the wrapper script entirely and just have

    ENV PYTHONUNBUFFERED=1
    CMD ["./main.py"]
    

    in your Dockerfile. (The script already has a correct "shebang" line to not need to explicitly use python3 in the command line, you also need to make sure you've chmod +x main.py on the host system to mark it as executable. The ENV line makes Python not capture log messages internally; also see Why doesn't Python app print anything when run in a detached docker container?)

    In the form you currently show, docker run will print the logs directly to its own stdout. If your cron daemon is set up to email the results of cron jobs, you'll get the logs in email. More generally, you can retrieve these logs with docker logs so long as the container isn't deleted.

    In a cloud environment, this is the "normal" way of getting logs out of a container process. If you ran this in Kubernetes, for example, you'd use kubectl logs rather than docker logs but the underlying mechanism is still the same. I'd expect that anything capable of running a container and reporting logs will work if you log to stdout and not a file.