Search code examples
pysparkazure-synapse

Error importing delta package into Synapse notebook


I need to import delta-spark package into Azure Synapse notebook but I receive error "ModuleNotFoundError: No module named 'pyspark.errors'".

The package is installed (along with pyspark). Imports from pyspark (such as pyspark.sql.functions) work but import of DeltaTable fails on pyspark.errors. Same import works in different environment (Databricks notebook) with same package versions. I tried to use findspark but it solved nothing (probably because I don't have issue importing pyspark but some package under pyspark).

code:

%pip install delta-spark

output (selected):

Requirement already satisfied: delta-spark in /nfs4/pyenv-fee9cba1-d59a-44ab-8b9e-957614b27f23/lib/python3.10/site-packages (3.0.0)  
Requirement already satisfied: pyspark<3.6.0,>=3.5.0 in /nfs4/pyenv-fee9cba1-d59a-44ab-8b9e-957614b27f23/lib/python3.10/site-packages (from delta-spark) (3.5.0)

code:

import pyspark.sql.functions as F
from pyspark.sql import Window
from delta.tables import DeltaTable

output (selected):

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In [9], line 10
      8 import pyspark.sql.functions as F
      9 from pyspark.sql import Window
---> 10 from delta.tables import DeltaTable

...

File ~/cluster-env/env/lib/python3.10/site-packages/delta/exceptions.py:20
     17 from typing import TYPE_CHECKING, Optional
     19 from pyspark import SparkContext
---> 20 from pyspark.errors.exceptions import captured
     21 from pyspark.errors.exceptions.captured import CapturedException
     22 from pyspark.sql.utils import (
     23     AnalysisException,
     24     IllegalArgumentException,
     25     ParseException
     26 )

ModuleNotFoundError: No module named 'pyspark.errors'

This is the environment setting of the Spark pool:

SPARK_HOME -> /opt/spark
PYTHONPATH -> /opt/spark/python/lib/pyspark.zip<CPS>/opt/spark/python/lib/py4j-0.10.7-src.zip<CPS>/opt/spark/python/lib/pyspark.zip<CPS>/opt/spark/python/lib/py4j-0.10.7-src.zip

Solution

  • from delta.tables import DeltaTable
    

    This command will work in synapse notebook without installing the delta-spark package.

    I got same error when I tried to use from delta.tables import DeltaTable command after %pip install delta-spark.

    But before installing it, the above is working fine. Installing delta-spark upon the delta package in synapse notebook might be the reason for the above error.

    To use from delta.tables import DeltaTable, don't install the delta-spark. You can directly use it like below.

    enter image description here

    You can see, it is working fine without installing any package.