Search code examples
databricksrasterazure-databrickspython-xarray

Databricks - export dataset to a raster file in DBFS using rioxarray


I am trying to follow simple example in rioxarray documentation to export a dataset to a raster file in DBFS.

Install the library:

%pip install rioxarray

Run example code from the docs (input dataset can be found here):

import rioxarray
rds = rioxarray.open_rasterio("/dbfs/FileStore/tables/PLANET_SCOPE_3D.nc", decode_times=False)
rds.isel(time=0).rio.to_raster("/dbfs/FileStore/tables/planet_scope.tif")

However, when I list the files in the directory (dbutils.fs.ls("dbfs:/FileStore/tables")) the file planet_scope.tif is not present.

How can I export dataset to raster file in DBFS?


Solution

  • I suspect that this is caused by the limitations of the DBFS local file API, although the library doesn't throw the exception - just simply don't create a file.

    The solution would be to output raster to the local disk, and copy file into the DBFS using dbutils.fs.cp:

    import rioxarray
    rds = rioxarray.open_rasterio("/dbfs/FileStore/tables/PLANET_SCOPE_3D.nc", decode_times=False)
    rds.isel(time=0).rio.to_raster("/tmp/planet_scope.tif")
    dbutils.fs.cp("file:/tmp/planet_scope.tif", "/FileStore/tables/planet_scope.tif", True)