I am trying to integrate Apache Arrow with Apache Spark in a PySpark application, but I am encountering an issue related to sun.misc.Unsafe
or java.nio.DirectByteBuffer
during the execution.
import os
import pandas as pd
from pyspark.sql import SparkSession
extra_java_options = os.getenv("SPARK_EXECUTOR_EXTRA_JAVA_OPTIONS", "")
spark = SparkSession.builder \
.appName("ArrowPySparkExample") \
.getOrCreate()
spark.conf.set("Dio.netty.tryReflectionSetAccessible", "true")
spark.conf.set("spark.sql.execution.arrow.pyspark.enabled", "true")
pdf = pd.DataFrame(["midhun"])
df = spark.createDataFrame(pdf)
result_pdf = df.select("*").toPandas()
Error Message:
in stage 0.0 (TID 11) (192.168.140.22 executor driver): java.lang.UnsupportedOperationException: sun.misc.Unsafe or java.nio.DirectByteBuffer.<init>(long, int) not available
at org.apache.arrow.memory.util.MemoryUtil.directBuffer(MemoryUtil.java:174)
at org.apache.arrow.memory.ArrowBuf.getDirectBuffer(ArrowBuf.java:229)
Environment:
Apache Spark version: 3.4 Apache Arrow version: 1.5 Java version: jdk 21
Same issue with:
Downgrading java to test minimum supported version.
Update: