Search code examples
azure-machine-learning-service

What is the best practice on folder structure for Azure Machine Learning service (preview) projects


I'm very excited on the newly released Azure Machine Learning service (preview), which is a great step up from the previous (and deprecated) Machine Learning Workbench.

However, I am thinking a lot about the best practice on structuring the folders and files in my project(s). I'll try to explain my thoughts.

Looking at the documentation for the training of a model (e.g. Tutorial #1), there seems to be good-practice to put all training scripts and necessary additional scripts inside a subfolder, so that it can be passed into the Estimator object without also passing all other files in the project. This is fine.

But when working with the deployment of the service, specifically the deployment of the image, the documentation (e.g. Tutorial #2) seems to indicate that the scoring script need to be located in the root folder. If I try to refer to a script located in a subfolder, I get an error message saying

WebserviceException: Unable to use a driver file not in current directory. Please navigate to the location of the driver file and try again.

This may not be a big deal. Except, I have some additional scripts that I import both in the training script and in the scoring script, and I don't want to duplicate those additional scripts to be able to import them in both the training and the scoring scripts.

I am working mainly in Jupyter Notebooks when executing the training and the deployment, and I could of course use some tricks to read the particular scripts from some other folder, save them to disk as a copy, execute the training or deployment while referring to the copies and finally delete the copies. This would be a decent workaround, but it seems to me that there should be a better way than just decent.

What do you think?


Solution

  • Currently, the score.py needs to be in current working directory, but dependency scripts - the dependencies argument to ContainerImage.image_configuration - can be in a subfolder.

    Therefore, you should be able to use folder structure like this:

    ./score.py 
    ./myscripts/train.py 
    ./myscripts/common.py
    

    Note that the relative folder structure is preserved during web service deployment; if you reference the common file in subfolder from your score.py, that reference should be valid within deployed image.