Search code examples
pythonazure-batch

Can a batch task read files on a file share?


I have a file share with (you guessed it) a lot of files. I want to create a batch job which mounts this file share and the reads in each of the files and processes each one in parallel (each as a batch task).

Is this possible to do with python and in azure batch? Any tutorial showing how to do this would be great.


Solution

  • You can do this in one of two ways. Note that the following only applies to Linux. Windows users will need to follow a slightly different method using User Identities.

    1. Mount the file share at the compute node level using the pool's StartTask object. Please see the Azure File documentation on how to do this for your distro on Linux. The start task can either:
      • Mount the file share directly, i.e., call mount -t cifs .... This will work through reboots as the StartTask is re-run everytime on reboot.
      • Modify /etc/fstab to add an entry to automount. Note that you must make this operation idempotent as the StartTask is re-run everytime on reboot.
    2. Mount the file share at the job level using the job's JobPreparationTask object. The command you specify here will only run once for every task under the job. You should probably also specify the job's JobReleaseTask to unmount the share for cleanup.

    Make sure, in any path you choose, that proper elevation privileges are given to the task (typically superuser) such that the process can perform the mount or modify /etc/fstab.

    If you go with the first option, the mount will be available all the time to the compute node regardless if a job that requires it or not is run on that node. There are advantages and disadvantages for each approach. Your requirements, be it compliance, or technical (for example) should help you on which to choose.