Search code examples
apache-sparkbuildsbtprofiles

How to build Spark 1.6.1 with SBT on Windows using Hadoop profiles?


How can one activate Hadoop and YARN profiles while building Spark on Windows (8-10) with SBT?

>sbt package

The above code works, but could not activate profiles with the following:

>sbt -Pyarn package

I'm asking, because mvn is exceptionally slow compared to SBT. I have experience building Spark on Linux using both SBT and Maven.


Solution

  • You have to use ./build/sbt script bundled with Spark source distribution. It calls another script sbt-launch-lib.bash that does some profile-related magic:

    enableProfile () {
      dlog "[enableProfile] arg = '$1'"
      maven_profiles=( "${maven_profiles[@]}" "$1" )
      export SBT_MAVEN_PROFILES="${maven_profiles[@]}"
    }
    

    On the other side, project definition SparkBuild extends PomBuild, that allows usage of Maven project (including profiles):

    override val profiles = {                                                                                                              
      val profiles = Properties.envOrNone("SBT_MAVEN_PROFILES") match {                                                                    
        ...
      }                                                                                                                             
      profiles                                                                                                                             
    }    
    

    So it should work if you run it like this (using Cygwin):

    sh build/sbt -Pyarn package
    

    Nevertheless, it didn't work for me out of the box because of incorrect discovery of path to sbt-launch-lib.bash. So I've replaced in build\sbt one line:

    . "$(dirname "$(realpath "$0")")"/sbt-launch-lib.bash
    

    to

    . "$(dirname "$(realpath "$0")")"/build/sbt-launch-lib.bash