Search code examples
azureazure-data-factoryazure-batch

Periodic stdout and stderr from azure batch service


I'm running a pipeline in azure data factory, and I'm using a custom cell to run a azure batch activity.

The azure batch job I run is really big, and I would like to monitor which stage I'm in that job. On a remote VM, I typically do this using the logging module in python.

I am able to get the status of the jobs (i.e., all the logging information) when it has finished, but I would like to obtain it when running the job.

How do I do this?


Solution

  • Batch automatically captures stdout/stderr into stdout.txt and stderr.txt for the task in the task directory. Make sure you periodically flush your streams, if needed. You have two options here:

    1. Implement logic within your program (executed as a Batch task) to periodically egress those files out to some other place where you can view (for example to Azure Storage Blob).
    2. Implement logic on your client to periodically call GetFile and retrieve new offsets (ocp-range header) of either stdout.txt or stderr.txt. Various language SDKs have convenience APIs if using those instead of REST.