Search code examples
mesosmesospheremarathondcos

Spark submit using mesos dcos cli


I'm trying to start a spark streaming job on mesos using the DCOS cli. I'm able to start the job. My program expects a config file to be passed as cli parameter. How do I achieve this with dcos spark run --submit-args?

I tried --files http://server/path/to//file hoping it will download files but that didn't work. Driver starts but fails because config file is missing.

I also tried to roll up the jar and config file as tar and submitted it. I can see in Mesos logs that the tar was fetched and untar. Both config and jar file are seen in the working directory. But job fails with ClassNotFoundException. I suspect something was not right about how spark-submit was started.

dcos spark run --submit-args="--supervise --deploy-mode cluster --class package.name.classname http://file-server:8000/Streaming.tar.gz Streaming.conf"

Any hint on how to proceed? Also, in which log file can I see the underlying spark-submit command used by DCOS?


Solution

  • Here is the example of a command you should launch in order to make it work:

    dcos spark run --submit-args='--conf spark.mesos.uris=https://s3-us-west-2.amazonaws.com/andrey-so-36323287/pi.conf --class JavaSparkPiConf https://s3-us-west-2.amazonaws.com/andrey-so-36323287/sparkPi_without_config_file.jar /mnt/mesos/sandbox/pi.conf'

    Where

    --conf spark.mesos.uris=... A comma-separated list of URIs to be downloaded to the sandbox when driver or executor is launched by Mesos. This applies to both coarse-grained and fine-grained mode.

    /mnt/mesos/sandbox/pi.conf A path to the downloaded file which your main class receives as a 0th parameter (see the code snippet below). /mnt/mesos/sandbox/ is a standard path inside a container which is mapped to a corespondent mesos-task sandbox.

    public final class JavaSparkPiConf {
    
      public static void main(String[] args) throws Exception {
        SparkConf sparkConf = new SparkConf().setAppName("JavaSparkPi");
        JavaSparkContext jsc = new JavaSparkContext(sparkConf);
    
        Scanner scanner = new Scanner(new FileInputStream(args[0]));
        int slices;
        if (scanner.hasNextInt()) {
          slices = scanner.nextInt();
        } else {
          slices = 2;
        }
        int n = 100000 * slices;
        List<Integer> l = new ArrayList<>(n);
        for (int i = 0; i < n; i++) {
          l.add(i);
        }
    
        JavaRDD<Integer> dataSet = jsc.parallelize(l, slices);
    
        int count = dataSet.map(new Function<Integer, Integer>() {
          @Override
          public Integer call(Integer integer) {
            double x = Math.random() * 2 - 1;
            double y = Math.random() * 2 - 1;
            return (x * x + y * y < 1) ? 1 : 0;
          }
        }).reduce(new Function2<Integer, Integer, Integer>() {
          @Override
          public Integer call(Integer integer, Integer integer2) {
            return integer + integer2;
          }
        });
    
        System.out.println("Pi is roughly " + 4.0 * count / n);
    
        jsc.stop();
      }
    }