Search code examples
apache-sparkapache-spark-sqlspark-graphxgraphframes

Why does "spark-shell --jars" with GraphFrames jar give "error: missing or invalid dependency detected while loading class file 'Logging.class'"?


I have run a command spark-shell --jars /home/krishnamahi/graphframes-0.4.0-spark2.1-s_2.11.jar and it threw me an error

error: missing or invalid dependency detected while loading class file 'Logging.class'. Could not access term typesafe in package com, because it (or its dependencies) are missing. Check your build definition for missing or conflicting dependencies. (Re-run with -Ylog-classpath to see the problematic classpath.) A full rebuild may help if 'Logging.class' was compiled against an incompatible version of com. error: missing or invalid dependency detected while loading class file 'Logging.class'. Could not access term scalalogging in value com.typesafe, because it (or its dependencies) are missing. Check your build definition for missing or conflicting dependencies. (Re-run with -Ylog-classpath to see the problematic classpath.) A full rebuild may help if 'Logging.class' was compiled against an incompatible version of com.typesafe. error: missing or invalid dependency detected while loading class file 'Logging.class'. Could not access type LazyLogging in value com.slf4j, because it (or its dependencies) are missing. Check your build definition for missing or conflicting dependencies. (Re-run with -Ylog-classpath to see the problematic classpath.) A full rebuild may help if 'Logging.class' was compiled against an incompatible version of com.slf4j.

I am using Spark Version 2.1.1, Scala Version 2.11.8, JDK Version 1.8.0_131, CentOS7 64-bit, Hadoop 2.8.0. Can anyone please tell me what additional command should I give for perfect run of program? Thanks in advance.


Solution

  • If you want to play with GraphFrames use --packages command-line option of spark-shell instead.

    --packages Comma-separated list of maven coordinates of jars to include on the driver and executor classpaths. Will search the local maven repo, then maven central and any additional remote repositories given by --repositories. The format for the coordinates should be groupId:artifactId:version.

    For graphframes-0.4.0-spark2.1-s_2.11.jar that'd be as follows:

    $SPARK_HOME/bin/spark-shell --packages graphframes:graphframes:0.4.0-spark2.1-s_2.11
    

    which I copied verbatim from How to section of GraphFrames project.

    That way you don't have to search for all the (transitive) dependencies of GraphFrames library as Spark will do it for you automatically.