Search code examples

Read .nc files from Azure Datalake Gen2 in Azure Databricks

Trying to read .nc (netCDF4) files in Azure Databricks.

Never worked with .nc files

  1. All the required .nc files are in Azure Datalake Gen2
  2. Mounted above files into Databricks at "/mnt/eco_dailyRain"
  3. Can list the content of mount using"/mnt/eco_dailyRain") OUTPUT:

    Out[76]: [FileInfo(path='dbfs:/mnt/eco_dailyRain/', name='', size=429390127),
     FileInfo(path='dbfs:/mnt/eco_dailyRain/', name='', size=428217143),
     FileInfo(path='dbfs:/mnt/eco_dailyRain/', name='', size=428218181),
     FileInfo(path='dbfs:/mnt/eco_dailyRain/', name='', size=428217139),
     FileInfo(path='dbfs:/mnt/eco_dailyRain/', name='', size=429390143),
     FileInfo(path='dbfs:/mnt/eco_dailyRain/', name='', size=428217137),
     FileInfo(path='dbfs:/mnt/eco_dailyRain/', name='', size=428217127),
     FileInfo(path='dbfs:/mnt/eco_dailyRain/', name='', size=428217143),
     FileInfo(path='dbfs:/mnt/eco_dailyRain/', name='', size=429390137),
     FileInfo(path='dbfs:/mnt/eco_dailyRain/', name='', size=428217127),
     FileInfo(path='dbfs:/mnt/eco_dailyRain/', name='', size=428217134),
     FileInfo(path='dbfs:/mnt/eco_dailyRain/', name='', size=428218181),
     FileInfo(path='dbfs:/mnt/eco_dailyRain/', name='', size=429390127),
     FileInfo(path='dbfs:/mnt/eco_dailyRain/', name='', size=428217143),
     FileInfo(path='dbfs:/mnt/eco_dailyRain/', name='', size=428218104),
     FileInfo(path='dbfs:/mnt/eco_dailyRain/', name='', size=428217134),
     FileInfo(path='dbfs:/mnt/eco_dailyRain/', name='', size=429390127),
     FileInfo(path='dbfs:/mnt/eco_dailyRain/', name='', size=428217223),
     FileInfo(path='dbfs:/mnt/eco_dailyRain/', name='', size=418143765),
     FileInfo(path='dbfs:/mnt/eco_dailyRain/', name='', size=370034113),
     FileInfo(path='dbfs:/mnt/eco_dailyRain/Consignments.parquet', name='Consignments.parquet', size=237709917),
     FileInfo(path='dbfs:/mnt/eco_dailyRain/', name='', size=428217137)]

Just to test wether can read from mount.'dbfs:/mnt/eco_dailyRain/Consignments.parquet')

confirms can read parquet file.


Out[83]: DataFrame[CONSIGNMENT_PK: int, CERTIFICATE_NO: string, ACTOR_NAME: string, GENERATOR_FK: int, TRANSPORTER_FK: int, RECEIVER_FK: int, REC_POST_CODE: string, WASTEDESC: string, WASTE_FK: int, GEN_LICNUM: string, VOLUME: int, MEASURE: string, WASTE_TYPE: string, WASTE_ADD: string, CONTAMINENT1_FK: int, CONTAMINENT2_FK: int, CONTAMINENT3_FK: int, CONTAMINENT4_FK: int, TREATMENT_FK: int, ANZSICODE_FK: int, VEH1_REGNO: string, VEH1_LICNO: string, VEH2_REGNO: string, VEH2_LICNO: string, GEN_SIGNEE: string, GEN_DATE: timestamp, TRANS_SIGNEE: string, TRANS_DATE: timestamp, REC_SIGNEE: string, REC_DATE: timestamp, DATECREATED: timestamp, DISCREPANCY: string, APPROVAL_NUMBER: string, TR_TYPE: string, REC_WASTE_FK: int, REC_WASTE_TYPE: string, REC_VOLUME: int, REC_MEASURE: string, DATE_RECEIVED: timestamp, DATE_SCANNED: timestamp, HAS_IMAGE: string, LASTMODIFIED: timestamp]

But trying to read netCDF4 files says No such file or directory


import datetime as dt  # Python standard library datetime  module
import numpy as np
from netCDF4 import Dataset  #
import matplotlib.pyplot as plt

rootgrp = Dataset("dbfs:/mnt/eco_dailyRain/","r", format="NETCDF4")


FileNotFoundError: [Errno 2] No such file or directory: b'dbfs:/mnt/eco_dailyRain/'

Any clues.


  • According to the API reference of netCDF4 module for class Dataset, as the figure below.

    enter image description here

    The value of the path parameter for Dataset should be a path of unix directory format, but the path dbfs:/mnt/eco_dailyRain/ is a format for PySpark as I known, so you got the error FileNotFoundError: [Errno 2] No such file or directory: b'dbfs:/mnt/eco_dailyRain/'.

    The solution to fix it is to change the path value dbfs:/mnt/eco_dailyRain/ with the equivalence unix path /dbfs/mnt/eco_dailyRain/, the code as below.

    rootgrp = Dataset("/dbfs/mnt/eco_dailyRain/","r", format="NETCDF4")

    You can check it via the code below to see it.

    ls /dbfs/mnt/eco_dailyRain

    Ofcouse, you also can list your data files of netCDF4 format via'/mnt/eco_dailyRain') if you had mount it.