Search code examples
azuredatabricksazure-databricks

How to check if there is directory already exists or not in databricks


How do I check if there is already exists or not in databricks

dir = "/mnt/published/omega/omega_output"
if(dbutils.fs.exists(dir)):
 print("dir exists")
else:
 print("dir does not exists")

This code throws me error as

'FSHandler' object has no attribute 'exists'

Solution

  • There is no exists function in the dbutils.fs. There are few approaches to solve this:

    1. Use local file API - it will work only with mounted resources. You need to append /dbfs to the path:
    import os
    dir = '/mnt/....'
    if os.path.exists(f"/dbfs{dir}"):
      ....
    
    1. Use Hadoop file API - it will work with dbfs:/, abfss:/, ...
    URI           = sc._gateway.jvm.java.net.URI
    Path          = sc._gateway.jvm.org.apache.hadoop.fs.Path
    FileSystem    = sc._gateway.jvm.org.apache.hadoop.fs.FileSystem
    Configuration = sc._gateway.jvm.org.apache.hadoop.conf.Configuration
    
    dir = "..."
    fs = FileSystem.get(URI(dir), Configuration())
    
    if fs.exists(Path(dir)):
      ...
    
    1. Try to list a path, and catch exception when file doesn't exist or not accessible. Main problem - you can't distinguish between files/directories that doesn't exist and files/directories to which you don't have access permissions:
    def file_exists(dir):
      try:
        dbutils.fs.ls(dir)
      except:
        return False  
      return True