Search code examples
pythontensorflowkerasdatabricksazure-data-lake

Trouble loading tensorflow keras model in databricks from Azure data lake using python


I have a model that I saved inside a project_name in a container in azure data lake. I am having issues when loading a tensorflow model from databricks. Everything is already working fine as I tested out in jupyter notebook previously. I have to migrate the code in databricks.

This is a code I ran.

model = tf.keras.models.load_model("abfss://[email protected]/hkay/project_name/model/keras2.tf", compile=False)

This is the error I'm getting.

UnimplementedError                        Traceback (most recent call last)
<command-3129204037083358> in <module>
      3 
      4 ## Loading a model
----> 5 model = tf.keras.models.load_model("abfss://[email protected]/hkay/project_name/model/keras2.tf", compile=False)

/local_disk0/.ephemeral_nfs/envs/pythonEnv-03228642-df50-44d1-8e0e-f760ea5a0429/lib/python3.8/site-packages/keras/utils/traceback_utils.py in error_handler(*args, **kwargs)
     68             # To get the full stack trace, call:
     69             # `tf.debugging.disable_traceback_filtering()`
---> 70             raise e.with_traceback(filtered_tb) from None
     71         finally:
     72             del filtered_tb

/local_disk0/.ephemeral_nfs/envs/pythonEnv-03228642-df50-44d1-8e0e-f760ea5a0429/lib/python3.8/site-packages/tensorflow/python/lib/io/file_io.py in file_exists_v2(path)
    288   """
    289   try:
--> 290     _pywrap_file_io.FileExists(compat.path_to_bytes(path))
    291   except errors.NotFoundError:
    292     return False

UnimplementedError: File system scheme 'abfss' not implemented (file: 'abfss://[email protected]/hkay/project_name/model/keras2.tf')

It was running fine when I ran it in jupyter notebook. The model was saved locally when I was using jupyter.

The connection is working because I tested out the connection through reading the files from the path. The size of this model is 1.6 GB. I am not sure why it is not working. Anyone has any idea?


Solution

  • Unfortunately, Keras doesn't understand URIs. it is designed to work only with local files ,so you need to use local path saving or loading data .

    Try to use dbutils.fs.cp it will copy data from Storage URL abfss://dev@axxx to local path.

    Copy file into /tmp/model.tf and load it.

    #Setup storage configuration
    spark.conf.set("fs.azure.account.key.<storage_account_name>.blob.core.windows.net","<access_key>")
    
    dbutils.fs.cp("abfss://[email protected]/hkay/project_name/model/keras2.tf","/tmp/Demo_model.tf")
    

    you can check weather model copied or not using below code.

    display(dbutils.fs.ls("/tmp/Demo_model.tf"))
    

    Loading model:

    from tensorflow import keras
    model = keras.models.load_model("/tmp/Demo_model.tf")