Search code examples
python-3.xfasttextazure-machine-learning-service

Registering and downloading a fastText .bin model fails with Azure Machine Learning Service


I have a simple RegisterModel.py script that uses the Azure ML Service SDK to register a fastText .bin model. This completes successfully and I can see the model in the Azure Portal UI (I cannot see what model files are in it). I then want to download the model (DownloadModel.py) and use it (for testing purposes), however it throws an error on the model.download method (tarfile.ReadError: file could not be opened successfully) and makes a 0 byte rjtestmodel8.tar.gz file.

I then use the Azure Portal and Add Model and select the same bin model file and it uploads fine. Downloading it with the download.py script below works fine, so I am assuming something is not correct with the Register script.

Here are the 2 scripts and the stacktrace - let me know if you can see anything wrong:

RegisterModel.py

import azureml.core
from azureml.core import Workspace, Model
ws = Workspace.from_config()
model = Model.register(workspace=ws,
                       model_name='rjSDKmodel10',
                       model_path='riskModel.bin')

DownloadModel.py

# Works when downloading the UI Uploaded .bin file, but not the SDK registered .bin file
import os
import azureml.core
from azureml.core import Workspace, Model

ws = Workspace.from_config()
model = Model(workspace=ws, name='rjSDKmodel10')
model.download(target_dir=os.getcwd(), exist_ok=True)

Stacktrace

Traceback (most recent call last):
  File "...\.vscode\extensions\ms-python.python-2019.9.34474\pythonFiles\ptvsd_launcher.py", line 43, in <module>
    main(ptvsdArgs)
  File "...\.vscode\extensions\ms-python.python-2019.9.34474\pythonFiles\lib\python\ptvsd\__main__.py", line 432, in main
    run()
  File "...\.vscode\extensions\ms-python.python-2019.9.34474\pythonFiles\lib\python\ptvsd\__main__.py", line 316, in run_file
    runpy.run_path(target, run_name='__main__')
  File "...\.conda\envs\DoC\lib\runpy.py", line 263, in run_path
    pkg_name=pkg_name, script_name=fname)
  File "...\.conda\envs\DoC\lib\runpy.py", line 96, in _run_module_code
    mod_name, mod_spec, pkg_name, script_name)
  File "...\.conda\envs\DoC\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "...\\DownloadModel.py", line 21, in <module>
    model.download(target_dir=os.getcwd(), exist_ok=True)
  File "...\.conda\envs\DoC\lib\site-packages\azureml\core\model.py", line 712, in download
    file_paths = self._download_model_files(sas_to_relative_download_path, target_dir, exist_ok)
  File "...\.conda\envs\DoC\lib\site-packages\azureml\core\model.py", line 658, in _download_model_files
    file_paths = self._handle_packed_model_file(tar_path, target_dir, exist_ok)
  File "...\.conda\envs\DoC\lib\site-packages\azureml\core\model.py", line 670, in _handle_packed_model_file
    with tarfile.open(tar_path) as tar:
  File "...\.conda\envs\DoC\lib\tarfile.py", line 1578, in open
    raise ReadError("file could not be opened successfully")
tarfile.ReadError: file could not be opened successfully

Environment

  • riskModel.bin is 6 megs
  • AMLS 1.0.60
  • Python 3.7
  • Working locally with Visual Code

Solution

  • The Azure Machine Learning service SDK has a bug with how it interacts with Azure Storage, which causes it to upload corrupted files if it has to retry uploading.

    A couple workarounds:

    1. The bug was introduced in 1.0.60 release. If you downgrade to AzureML-SDK 1.0.55, the code should fail when there are issue uploading instead of silently corrupting data.
    2. It's possible that the retry is being triggered by the low timeout values that the AzureML-SDK defaults to. You could investigate changing the timeout in site-packages/azureml/_restclient/artifacts_client.py

    This bug should be fixed in the next release of the AzureML-SDK.