Search code examples
databricksazure-databricksgdal

GDAL on Databricks Cluster Runtime 12.2 LTS


I need gdal in my course work.

After reading this post, I used init script as follows to install gdal into runtime 12.2 LTS

dbutils.fs.put("/databricks/scripts/gdal_install.sh","""
#!/bin/bash
sudo add-apt-repository ppa:ubuntugis/ppa
sudo apt-get update
sudo apt-get install -y cmake gdal-bin libgdal-dev python3-gdal""",
True)

The init script ran and cluster could start properly but when i run import gdal in notebook, i get the following error:

ModuleNotFoundError: No module named 'gdal'

I also tried installing gdal into the cluster via Maven repository, it does not work either.

May I know what I can do to get gdal installed properly?

Thank you.


Solution

  • You can follow below way to install GDAL.

    Alter the code in your init script as below and restart the cluster.

    dbutils.fs.put("/databricks/scripts/gdal_install.sh","""
    #!/bin/bash
    sudo add-apt-repository ppa:ubuntugis/ppa
    sudo apt-get update
    sudo apt-get install -y libgdal-dev""",
    True)
    

    enter image description here

    After, restarting check the header version as below.

    %sh
    gdal-config --version
    

    enter image description here

    And the same version of gdal should be installed via pip.

    Here, I got 3.3.2.

    pip install GDAL==3.3.2
    

    Output:

    enter image description here

    After pip install your import statements will work.