Search code examples
javaapache-sparkapache-spark-sqlapache-spark-datasetapache-spark-encoders

spark sql encoder for immutable data type


I've generally used immutable value types when writing java code. Sometimes it's been through libraries (Immutables, AutoValue, Lombok), but mostly just vanilla java classes with:

  • all final fields
  • a constructor with all fields as parameters

(This question is for java 11 and below, given current spark support).

In Spark Sql, data types require an Encoder. Using off-the-shelf encoders like Encoder.bean(MyType.class), using such an immutable data type results in "illegal reflective access operation".

I'm curious what the spark sql (dataset) approach is here. Obviously I could relax this and make it a mutable pojo.


Update

Looking into the code for Encoders.bean it really does have to be a classic, mutable POJO. The reflection code looks for appropriate setters. Further (and this is documented) the only supported collection types are array, list and map (not set).


Solution

  • This was actually a misdiagnosis. The immutability of my data type was not causing the reflective access issues. It was a JVM 11+ issue (mostly noted here) https://github.com/renaissance-benchmarks/renaissance/issues/241

    By adding the following JVM arguments everything is working correctly:

    --illegal-access=deny --add-opens java.base/java.nio=ALL-UNNAMED --add-opens java.base/sun.nio.ch=ALL-UNNAMED