My lab just got a sponsorship from Microsoft Azure and I'm exploring how to utilize it. I'm new to industrial level cloud service and pretty confused about tons of terminologies and concepts. In short, here is my scenario:
The idea system configuration would be a bunch of identical VMs and a shared file system. Then I can submit my job with 'qsub' from a script or something. Is there a way to do this on Azure?
I investigated the Batch Service, but having trouble installing dependencies after creating compute node. I also had trouble with storage. So far I only saw examples of using Batch with Blob storage, with is unstructured.
So are there any other services in Azure can meet my requirement?
I somehow figured it out my self based on the article: https://azure.microsoft.com/en-us/documentation/articles/virtual-machines-linux-classic-hpcpack-cluster/. Here is my solution:
Create a HPC Pack with a Windows head node and a set of Linux compute node. Here are several useful template in Marketplace.
From Head node, we can execute command inside Linux compute node, either inside HPC Cluster Manager, or using "clusrun" inside PowerShell. We can easily install dependencies via apt-get for computing node.
Create a File share inside one of the storage account. This can be mounted by all machines inside the cluster.
One glitch here is that for some encryption reason, you can not mount the File share on Linux machines outside the Azure. There are two solutions in my head: (1) mount the file share to Windows head node, and create file sharing from there, either by FTP or SSH. (2) create another Linux VM (as a bridge), mount the File share on that VM and use "scp" to communicate with it from outside. Since I'm not familiar with Windows, I adopted the later solution.
For executable, I simply uploaded the binary executable compiled on my local machine. Most dependencies are statically linked. There are still a few dynamic objects, though. I upload these dynamic object to the Azure and set LD_LIBRARY_PATH when execute programs on computing node.
Job submission is done in Windows head node. To make it more flexible, I wrote a python script, which writes XML files. The Job Manager can load these XML files to create a job. Here are some instructions: https://msdn.microsoft.com/en-us/library/hh560266(v=vs.85).aspx
I believe there should be more a elegant solution with Azure Batch Service, but so far my small cluster runs pretty well with HPC Pack. Hope this post can help somebody.