Search code examples
amazon-web-servicesamazon-emrspark-notebook

How to load library/ maven dependency in AWS EMR notebook


I am using AWS notebook. I can run normal scala based spark jobs without third-party library dependency fine. But I want to load some common libraries like typesafe-config, mysql-connector etc.

How can I add these library dependency in scala spark notebook on AWS?

I tried adding these snippets in first cell of notebook, but neither worked

 %%configure -f
    {
        "conf": {
            "spark.jars": "s3://bucket-xxx/jars/lib/config-1.3.1.jar"
        }
    }

as well as

%%configure -f
{
"conf": {"spark.jars.packages": "com.typesafe:config:1.3.1,mysql:mysql-connector-java:8.0.17"},

"jars": ["s3://bucket-xxx/jars/lib/"]

}

both threw the error

console>:29: error: object ConfigFactor is not a member of package com.typesafe.config import com.typesafe.config.ConfigFactor

when I tried to import the typesafe config

import com.typesafe.config.ConfigFactor

I also tried adding maven coordinates in notebook metadata as

"customDeps": [
        "com.typesafe:config:1.3.1"
    ]

and got

error: object typesafe is not a member of package com import com.typesafe.config.ConfigFactor


Solution

  • You have a typo in import line, it should be

    import com.typesafe.config.ConfigFactory
    

    In addition this cell is required in Jupyter notebook

    %%configure -f 
    {
      "jars": ["s3://test/libs/config-1.3.1.jar"],
      "conf": {"spark.jars.packages": "com.typesafe:config:1.3.1"}
    }
    

    I hope it is helpful.