Search code examples
javascalaapache-sparkbytecodepackage-private

Package-private scope in Scala visible from Java


I just found out about a pretty weird behaviour of Scala scoping when bytecode generated from Scala code is used from Java code. Consider the following snippet using Spark (Spark 1.4, Hadoop 2.6):

import java.util.Arrays;
import java.util.List;

import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.broadcast.Broadcast;

public class Test {
    public static void main(String[] args) {
        JavaSparkContext sc = 
            new JavaSparkContext(new SparkConf()
                                .setMaster("local[*]")
                                .setAppName("test"));

        Broadcast<List<Integer>> broadcast = sc.broadcast(Arrays.asList(1, 2, 3));

        broadcast.destroy(true);

        // fails with java.io.IOException: org.apache.spark.SparkException: 
        // Attempted to use Broadcast(0) after it was destroyed
        sc.parallelize(Arrays.asList("task1", "task2"), 2)
          .foreach(x -> System.out.println(broadcast.getValue()));
    }
}

This code fails, which is expected as I voluntarily destroy a Broadcast before using it, but the thing is that in my mental model it should not even compile, let alone running fine.

Indeed, Broadcast.destroy(Boolean) is declared as private[spark] so it should not be visible from my code. I'll try looking at the bytecode of Broadcast but it's not my specialty, that's why I prefer posting this question. Also, sorry I was too lazy to create an example that does not depend on Spark, but at least you get the idea. Note that I can use various package-private methods of Spark, it's not just about Broadcast.

Any idea of what's going on ?


Solution

  • If we reconstruct this issue with a simpler example:

    package yuvie
    
    class X {
      private[yuvie] def destory(d: Boolean) = true
    }
    

    And decompile this in Java:

    [yuvali@localhost yuvie]$ javap -p X.class 
    Compiled from "X.scala"
    public class yuvie.X {
      public boolean destory(boolean);
      public yuvie.X();
    }
    

    We see that private[package] in Scala becomes public in Java. Why? This comes from the fact that Java private package isn't equivalent to Scala private package. There is a nice explanation in this post:

    The important distinction is that 'private [mypackage]' in Scala is not Java package-private, however much it looks like it. Scala packages are truly hierarchical, and 'private [mypackage]' grants access to classes and objects up to "mypackage" (including all the hierarchical packages that may be between). (I don't have the Scala spec reference for this and my understating here may be hazy, I'm using [4] as a reference.) Java's packages are not hierarchical, and package-private grants access only to classes in that package, as well as subclasses of the original class, something that Scala's 'private [mypackage]' does not allow.

    So, 'package [mypackage]' is both more and less restrictive that Java package-private. For both reasons, JVM package-private can't be used to implement it, and the only option that allows the uses that Scala exposes in the compiler is 'public.'