Search code examples
apache-sparklog4jspark-submit

Spark -- Loading log4j from JAR running spark-submit


I have developed a custom log4j to my spark application:

#######################
#    Roll by time     #
#######################
log4j.logger.myLogger=DEBUG, file 
log4j.appender.file=org.apache.log4j.rolling.RollingFileAppender
log4j.appender.file.RollingPolicy=org.apache.log4j.rolling.TimeBasedRollingPolicy
log4j.appender.file.RollingPolicy.FileNamePattern = contactabilidad_%d{yyyy-MM-dd-hh}.log
log4j.appender.file.layout=org.apache.log4j.PatternLayout
log4j.appender.file.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %C:%L - %m%n
log4j.appender.file.encoding=UTF-8
log4j.appender.file.MaxFileSize=5MB

I packged my project into a JAR, and I run it in spark-submit.

I just want write the logs into a file, as I do it, when the file log4j.properties is into the file system from where I run the spark-submit and I define this path in the command java-options.

spark-submit --class com.path.to.class.InitialContactDriver 
--driver-java-options "-Dlog4j.configuration=file:log4j.properties" 
--conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=file:log4j.properties" 
--master yarn    /home/cloudera/SNAPSHOT.jar

My point is: Can't I reference the log file which is inside of the JAR?

content inside of the JAR

As I do it with the class: --class com.path.to.class.InitialContactDriver

I am going to run the app into limited environment and I would like not upload files into the FS, just use it what I have in the JAR. Is that possible? And if it not, why not?

Thanks in advance! :)


Solution

  • You have to provide the --driver-class-path option in your command. Try with:

    spark-submit --class com.path.to.class.InitialContactDriver \
    --driver-java-options "-Dlog4j.configuration=file:log4j.properties" \
    --conf "spark.executor.extraJavaOptions=Dlog4j.configuration=file:log4j.properties" \
    --driver-class-path /home/cloudera/SNAPSHOT.jar \
    --master yarn    /home/cloudera/SNAPSHOT.jar
    

    I haven't tried with YARN, but in local mode and cluster mode worked fine.