I've generally used immutable value types when writing java code. Sometimes it's been through libraries (Immutables, AutoValue, Lombok), but mostly just vanilla java classes with:
final
fields(This question is for java 11 and below, given current spark support).
In Spark Sql, data types require an Encoder
. Using off-the-shelf encoders like Encoder.bean(MyType.class)
, using such an immutable data type results in "illegal reflective access operation".
I'm curious what the spark sql (dataset) approach is here. Obviously I could relax this and make it a mutable pojo.
Update
Looking into the code for Encoders.bean
it really does have to be a classic, mutable POJO. The reflection code looks for appropriate setters. Further (and this is documented) the only supported collection types are array
, list
and map
(not set
).
This was actually a misdiagnosis. The immutability of my data type was not causing the reflective access issues. It was a JVM 11+ issue (mostly noted here) https://github.com/renaissance-benchmarks/renaissance/issues/241
By adding the following JVM arguments everything is working correctly:
--illegal-access=deny --add-opens java.base/java.nio=ALL-UNNAMED --add-opens java.base/sun.nio.ch=ALL-UNNAMED