Search code examples
scalaapache-sparkamazon-s3aws-java-sdk

java.lang.NoClassDefFoundError: com/amazonaws/auth/AWSCredentialsProvider


I have a below tools

  1. Spark 2.4.3
  2. Scala 2.11.12
  3. OS : Windows 10

This is my sbt code to import the libraries

    libraryDependencies ++= Seq(        
        "javassist" % "javassist" % "3.12.1.GA" ,
        "com.typesafe" % "config" % "1.3.4",
        "org.apache.spark" %% "spark-core" % sparkVersion,      
        "org.apache.spark" %% "spark-sql" % sparkVersion ,
        "com.datastax.spark" %% "spark-cassandra-connector" % "2.4.1",
        "com.twitter" % "jsr166e" % "1.1.0",  
        "com.amazonaws" % "aws-java-sdk" % "1.11.592"
        "org.apache.hadoop" % "hadoop-aws" % "2.7.3",
        "org.apache.spark" %% "spark-catalyst" % sparkVersion       
    )

My scala code is as below

            val rdd = sparkSession.sparkContext.parallelize(
                                      Seq(
                                        ("first", Array(2.0, 1.0, 2.1, 5.4)),
                                        ("test", Array(1.5, 0.5, 0.9, 3.7)),
                                        ("choose", Array(8.0, 2.9, 9.1, 2.5))
                                      )
                                    )
            val dfWithoutSchema = sparkSession.createDataFrame(rdd)
            sparkSession.sparkContext.hadoopConfiguration.set("fs.s3a.access.key", "XXXXXX")
            sparkSession.sparkContext.hadoopConfiguration.set("fs.s3a.secret.key", "XXXXXXX")
            sparkSession.sparkContext.hadoopConfiguration.set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")

            dfWithoutSchema.write
            .mode("overwrite")
            .parquet("s3a://test-daily-extracts/sample2")

when i compile through SBT i am getting no errors. But when I run the code I am getting the error as

   java.lang.NoClassDefFoundError: com/amazonaws/auth/AWSCredentialsProvider

and my stack trace is as below

    at java.lang.Class.forName0(Native Method)
            at java.lang.Class.forName(Class.java:348)
            at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2134)
            at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2099)
            at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2193)
            at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2654)
            at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667)
            at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
            at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
            at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
            at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
            at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
            at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.<init>(FileOutputCommitter.java:113)
            at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.<init>(FileOutputCommitter.java:88)
            at org.apache.parquet.hadoop.ParquetOutputCommitter.<init>(ParquetOutputCommitter.java:43)
            at org.apache.parquet.hadoop.ParquetOutputFormat.getOutputCommitter(ParquetOutputFormat.java:442)
            at org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.setupCommitter(HadoopMapReduceCommitProtocol.scala:100)
            at org.apache.spark.sql.execution.datasources.SQLHadoopMapReduceCommitProtocol.setupCommitter(SQLHadoopMapReduceCommitProtocol.scala:40)
            at org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.setupTask(HadoopMapReduceCommitProtocol.scala:217)
            at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:229)
            at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:170)
            at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:169)
            at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
            at org.apache.spark.scheduler.Task.run(Task.scala:121)
            at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
            at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
            at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
            at java.lang.Thread.run(Thread.java:748)
    Caused by: java.lang.ClassNotFoundException: com.amazonaws.auth.AWSCredentialsProvider
            at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
            at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
            at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
            at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
            ... 30 more

Thanks in advance for any help.

EDIT:2019-07-17

I updated my SBT code to below.

    libraryDependencies ++= Seq(        
        "javassist" % "javassist" % "3.12.1.GA" ,
        "com.typesafe" % "config" % "1.3.4",
        "org.apache.spark" %% "spark-core" % sparkVersion,      
        "org.apache.spark" %% "spark-sql" % sparkVersion ,
        "com.datastax.spark" %% "spark-cassandra-connector" % "2.4.1",
        "com.twitter" % "jsr166e" % "1.1.0", 
        "com.amazonaws" % "aws-java-sdk" % "1.7.4", 
        "net.java.dev.jets3t" % "jets3t" % "0.9.4",
        "org.apache.hadoop" % "hadoop-aws" % "2.7.3",
        "org.apache.hadoop" % "hadoop-client" % "2.7.3",
        "org.apache.hadoop" % "hadoop-hdfs" % "2.7.3",
        "org.apache.spark" %% "spark-catalyst" % sparkVersion       
    )

added the below code to driver program.

    val  urls = urlsinclasspath(getClass.getClassLoader).foreach(println)


    def urlsinclasspath(cl: ClassLoader): Array[java.net.URL] = cl match {
        case null => Array()
        case u: java.net.URLClassLoader => u.getURLs() ++ urlsinclasspath(cl.getParent)
        case _ => urlsinclasspath(cl.getParent)
      }

I am able to see the aws-java-sdk-1.7.4 is loading at the run time now and it has AWSCredentialsProvider class in it. But still I am getting the below error. My complete tace is below

    19/07/17 17:02:25 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, XX.XX.XX.XX, executor 0): java.lang.NoClassDefFoundError: com/amazonaws/auth/AWSCredentialsProvider
                            at java.lang.Class.forName0(Native Method)
                            at java.lang.Class.forName(Class.java:348)
                            at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2134)
                            at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2099)
                            at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2193)
                            at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2654)
                            at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667)
                            at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
                            at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
                            at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
                            at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
                            at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
                            at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.<init>(FileOutputCommitter.java:113)
                            at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.<init>(FileOutputCommitter.java:88)
                            at org.apache.parquet.hadoop.ParquetOutputCommitter.<init>(ParquetOutputCommitter.java:43)
                            at org.apache.parquet.hadoop.ParquetOutputFormat.getOutputCommitter(ParquetOutputFormat.java:442)
                            at org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.setupCommitter(HadoopMapReduceCommitProtocol.scala:100)
                            at org.apache.spark.sql.execution.datasources.SQLHadoopMapReduceCommitProtocol.setupCommitter(SQLHadoopMapReduceCommitProtocol.scala:40)
                            at org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.setupTask(HadoopMapReduceCommitProtocol.scala:217)
                            at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:229)
                            at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:170)
                            at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:169)
                            at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
                            at org.apache.spark.scheduler.Task.run(Task.scala:121)
                            at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
                            at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
                            at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
                            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
                            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
                            at java.lang.Thread.run(Thread.java:748)
                    Caused by: java.lang.ClassNotFoundException: com.amazonaws.auth.AWSCredentialsProvider
                            at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
                            at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
                            at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
                            at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
                            ... 30 more

Solution

  • after lot of research on google I found that there was no error in my code, even though all jars are loading, My spark installation was missing hadoop.dll in C:\winutils\bin & C:\Windows\System32 I downloaded hadoop.dll from this link https://github.com/steveloughran/winutils/tree/master/hadoop-2.7.1/bin and placed in both directories. It worked very fine. I am not sure why error was misleading.

    Thanks all for your help.