Search code examples
javaapache-sparkjarlog4jemr

Customize log4j for apache spark job in EMR cluster


I have a question regarding use log4j and it's configuration file log4j.properties in Java for Spark jobs.

I have attached the log4j.properties together with my Spark job "jar" file, after it submits to EMR cluster my application will initial the log4j.properties file.

Here is my example code :

 public static void initializeLogger() {
            try {
                Properties logProperties = new Properties();
                logProperties.load(RddReadUtils.class.getClassLoader()
    .getResourceAsStream("resources/log4j.properties"));
                PropertyConfigurator.configure(logProperties);

            } catch (IOException e) {
                e.printStackTrace();
            }

        }

On my local machine it works, doesn't work in EMR cluster. Can anyone help for?
Thanks a lot


Solution

  • When you run your job on the cluster, log4j will use the properties file that is configured inside the cluster. This makes sense, since your job can be more independent of the environment.

    However, if you want to use a specific properties file, you can do the following:

    • Place your log4j.properties file somewhere on the cluster
    • Run your job with a configuration parameter, pointing to the properties file.

    Assuming you run your job with spark-submit, you can execute as follows:

    spark-submit --driver-java-options "-Dlog4j.configuration=file:///absolute/path/to/log4j.properties" job.jar