Search code examples
hadoopapache-sparkkerberosooziehortonworks-data-platform

Spark Launcher Jobs not starting because of token cant be found in cache after 24 hours


I have a Java Application, which runs continuously and checks a table in database for new records. When a New record is added in the table, the Java application do a unzip file and puts into HDFS location and then a Spark Job gets triggered(I am pro-grammatically triggering the Spark Job using 'SparkLauncher" class inside the Java Application), which does the processing for newly added file in HDFS location.

I have scheduled the Java Application in cluster using Oozie Java Action. The cluster is HDP kerberized cluster.

The Job is working perfectly fine for 24 hours. All the unzip happens and spark job is running.

But after 24 hours the unzip happens in Java Application but the Spark Job is not get triggered in Resource Manager.

Exception : Exception encountered while connecting to the server :INFO: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (owner=****, renewer=oozie mr token, realUser=oozie, issueDate=1498798762481, maxDate=1499403562481, sequenceNumber=36550, masterKeyId=619) can't be found in cache

As per my understanding, after 24 hours oozie is renewing the token, and that token is not getting updated for the Spark launcher Job. The spark Launcher is still looking for the older Token which is not available in cache.

Please help me, how I can make Spark Launcher to look for the new-token.


Solution

  • As per my understanding, after 24 hours oozie is renewing the token

    Why? Can you point to any documentation, source code, blog?

    Remember that Oozie is a scheduler for batch jobs, and its canonical use case (at Yahoo!) is for triggering hourly jobs.
    Only a pathological batch job would run for more than 24h, therefore renewal of the Hadoop delegation token is not really useful in Oozie.

    But your Java thing acts as a service, running continuously, and needing automatic restart if it ever crashes. So you should consider...

    • either Slider, if you really want to run it inside YARN (although there are many, many drawbacks -- how do you inspect the logs of a running YARN job? how can you make sure that the app starts on time and is not delayed by a lack of resources? how can you make sure that your app will not be killed because YARN needs resources for a high-priority job?) but it is probably overkill for simply running your toy app
    • or a plain Linux service running on some Edge Node -- it's a Do-It-Yourself task, but not extremely complicated, and there are tutorials on the web

    If you insist on using Oozie, in spite of all the limitations of both YARN and Oozie, then you have to change the way your app runs -- for instance, schedule the Coordinator to launch a job every 12h and pass the "nominal time" as Workflow property, edit the Workflow to pass that time to the Java app, edit the Java code so that the app exits at (arg + 11:58) and clears the way for the next exec.