Search code examples
scalaapache-sparkreflectionread-eval-print-loop

Spark adds hidden parameter to constructor of a Scala class


I do not know how to explain this, but Spark seems to add a hidden (implicit?) parameter to constructor. Here is code I tried in spark-shell (in regular Scala shell parameters list would be empty):

scala> class A {}
defined class A

scala> classOf[A].getConstructors()(0).getAnnotatedParameterTypes
res0: Array[java.lang.reflect.AnnotatedType] = Array(sun.reflect.annotation.AnnotatedTypeFactory$AnnotatedTypeBaseImpl@5ed65e4b)

Because of this parameter I cannot pass my custom InputFormat class to Spark's hadoopFile function. Any hints on what's going on here or at least how can I create class with parameter-less constructor?


Solution

  • The behavior seems to be the same as in ordinary Scala REPL

    $ scala
    Welcome to Scala 2.13.3 (Java HotSpot(TM) 64-Bit GraalVM EE 19.3.0, Java 1.8.0_231).
    Type in expressions for evaluation. Or try :help.
    
    scala> class A {}
    class A
    
    scala> classOf[A].getConstructors()(0).getAnnotatedParameterTypes
    val res0: Array[java.lang.reflect.AnnotatedType] = Array(sun.reflect.annotation.AnnotatedTypeFactory$AnnotatedTypeBaseImpl@383864d5)
    
    scala> classOf[A].getConstructors()(0).getParameters
    val res1: Array[java.lang.reflect.Parameter] = Array(final $iw $outer)
    

    REPL makes the class nested (every line in REPL is an instantiation of the outer class). This adds an instance of the outer class as a parameter to the constructor ($outer is the name of parameter, $iw is the outer class). You can reproduce this behavior as follows

    class X {
      class A {}
    }
    
    object App {
      def main(args: Array[String]): Unit = {
        val x = new X
    
        println(classOf[x.A].getConstructors()(0).getAnnotatedParameterTypes.mkString(","))
        // sun.reflect.annotation.AnnotatedTypeFactory$AnnotatedTypeBaseImpl@2f7c7260
    
        println(classOf[x.A].getConstructors()(0).getParameters.mkString(","))
        // final X $outer
      }
    }
    

    If you run REPL with compiler option -Xprint:typer switched on (like scala -Xprint:typer or spark-shell -Xprint:typer) you'll see

    $ scala -Xprint:typer
    Welcome to Scala 2.13.3 (Java HotSpot(TM) 64-Bit GraalVM EE 19.3.0, Java 1.8.0_231).
    Type in expressions for evaluation. Or try :help.
    
    scala> class A
    [[syntax trees at end of                     typer]] // <console>
    package $line3 {
      sealed class $read extends AnyRef with Serializable {
        def <init>(): $line3.$read = {
          $read.super.<init>();
          ()
        };
        sealed class $iw extends AnyRef with java.io.Serializable {
          def <init>(): $iw = {
            $iw.super.<init>();
            ()
          };
          class A extends scala.AnyRef {
            def <init>(): A = {
              A.super.<init>();
              ()
            }
          }
        };
        private[this] val $iw: $iw = new $read.this.$iw();
        <stable> <accessor> def $iw: $iw = $read.this.$iw
      };
      object $read extends scala.AnyRef with java.io.Serializable {
        def <init>(): type = {
          $read.super.<init>();
          ()
        };
        private[this] val INSTANCE: $line3.$read = new $read();
        <stable> <accessor> def INSTANCE: $line3.$read = $read.this.INSTANCE;
        <synthetic> private def writeReplace(): Object = new scala.runtime.ModuleSerializationProxy(classOf[$line3.$read$])
      }
    }
    
    class A
    

    So this additional constructor parameter $outer can be obtained as $line3.$read.INSTANCE.$iw

    scala> classOf[A].getConstructors()(0).newInstance($line3.$read.INSTANCE.$iw)
    
    ...
    
    val res0: Object = A@282ffbf5
    

    Be careful, the encoding can change in a different version of Scala. For example spark-shell from Spark 3.0.1 (pre-built for Hadoop 3.2) uses Scala 2.12.10 and there $lineXXX.$read.INSTANCE.$iw.$iw should be instead of $lineXXX.$read.INSTANCE.$iw

    $ spark-shell -Xprint:typer
    20/11/25 16:32:16 WARN Utils: Your hostname, dmitin-HP-Pavilion-Laptop resolves to a loopback address: 127.0.1.1; using 192.168.0.103 instead (on interface wlo1)
    20/11/25 16:32:16 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
    20/11/25 16:32:16 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
    Setting default log level to "WARN".
    To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
    Spark context Web UI available at http://192.168.0.103:4040
    Spark context available as 'sc' (master = local[*], app id = local-1606314741512).
    Spark session available as 'spark'.
    Welcome to
          ____              __
         / __/__  ___ _____/ /__
        _\ \/ _ \/ _ `/ __/  '_/
       /___/ .__/\_,_/_/ /_/\_\   version 3.0.1
          /_/
             
    Using Scala version 2.12.10 (Java HotSpot(TM) 64-Bit GraalVM EE 19.3.0, Java 1.8.0_231)
    Type in expressions to have them evaluated.
    Type :help for more information.
    
    scala> class A
    [[syntax trees at end of                     typer]] // <console>
    package $line14 {
      sealed class $read extends AnyRef with java.io.Serializable {
        def <init>(): $line14.$read = {
          $read.super.<init>();
          ()
        };
        sealed class $iw extends AnyRef with java.io.Serializable {
          def <init>(): $read.this.$iw = {
            $iw.super.<init>();
            ()
          };
          sealed class $iw extends AnyRef with java.io.Serializable {
            def <init>(): $iw = {
              $iw.super.<init>();
              ()
            };
            class A extends scala.AnyRef {
              def <init>(): A = {
                A.super.<init>();
                ()
              }
            }
          };
          private[this] val $iw: $iw = new $iw.this.$iw();
          <stable> <accessor> def $iw: $iw = $iw.this.$iw
        };
        private[this] val $iw: $read.this.$iw = new $read.this.$iw();
        <stable> <accessor> def $iw: $read.this.$iw = $read.this.$iw
      };
      object $read extends scala.AnyRef with Serializable {
        def <init>(): $line14.$read.type = {
          $read.super.<init>();
          ()
        };
        private[this] val INSTANCE: $line14.$read = new $read();
        <stable> <accessor> def INSTANCE: $line14.$read = $read.this.INSTANCE;
        <synthetic> private def readResolve(): Object = $line14.$read
      }
    }
    
    defined class A
    
    scala> classOf[A].getConstructors()(0).newInstance($line14.$read.INSTANCE.$iw.$iw)
    
    ...
    
    res0: Any = A@6621ab0c
    

    In Scala 2.12.6 scala -Xprint:typer produces

    $ ./scala -Xprint:typer
    Welcome to Scala 2.12.6 (Java HotSpot(TM) 64-Bit GraalVM EE 19.3.0, Java 1.8.0_231).
    Type in expressions for evaluation. Or try :help.
    
    scala> class A
    [[syntax trees at end of                     typer]] // <console>
    package $line3 {
      object $read extends scala.AnyRef {
        def <init>(): $line3.$read.type = {
          $read.super.<init>();
          ()
        };
        object $iw extends scala.AnyRef {
          def <init>(): type = {
            $iw.super.<init>();
            ()
          };
          object $iw extends scala.AnyRef {
            def <init>(): type = {
              $iw.super.<init>();
              ()
            };
            class A extends scala.AnyRef {
              def <init>(): A = {
                A.super.<init>();
                ()
              }
            }
          }
        }
      }
    }
    
    defined class A
    

    So now the class A is nested inside an object ($line3.$read.$iw.$iw) rather than class and in such case additional parameter is not added to the constructor of A

    object X {
      class A {}
    }
    
    object App {
      def main(args: Array[String]): Unit = {
        val x = X
    
        println(classOf[x.A].getConstructors()(0).getAnnotatedParameterTypes.toList)
        // List()
    
        println(classOf[x.A].getConstructors()(0).getParameters.toList)
        // List()
      }
    }