Search code examples
javacassandraguavacassandra-3.0spark-cassandra-connector

Spark 1.5 and datastax-ddc-3.2.1 Cassandra Dependency Jars?


I am using Spark 1.5 and Cassandra 3.2.1 . Could anyone specify what are the exact jars required to be present in the build path to connect , query and insert data to Cassandra .

Right now I am using the follwing jars spark-cassandra-connector_2.10-1.5.0-M3.jar apache-cassandra-clientutil-3.2.1.jar cassandra-driver-core-3.0.0-beta1-bb1bce4-SNAPSHOT-shaded.jar spark-assembly-1.5.1-hadoop2.0.0-mr1-cdh4.2.0.jar guava-18.0.jar netty-all-4.0.23.Final.jar

With the above jars I am able to connect to cassandra . I am able to truncate tables , and drop tables . But I am unable to insert any data not even a simple insert query .

Following is the code :

import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaSparkContext;

import com.datastax.driver.core.Session;
import com.datastax.spark.connector.cql.CassandraConnector;

public class Test {

public static void main(String[] args) {


    JavaSparkContext ctx = new JavaSparkContext(new SparkConf().setMaster("spark://blr-lt-203:7077").set("spark.cassandra.connection.host", "blr-lt-203").setAppName("testinsert").set("spark.serializer" ,"org.apache.spark.serializer.KryoSerializer").set("spark.kryoserializer.buffer.max" , "1024mb"));

    CassandraConnector connector = CassandraConnector.apply(ctx.getConf());

    Session session = connector.openSession();

    session.execute("insert into test.table1 (name) values ('abcd')") ;
    session.close();
    ctx.stop();

}

}

Following are the logs :

16/03/28 21:24:52 INFO BlockManagerMaster: Trying to register BlockManager
16/03/28 21:24:52 INFO BlockManagerMasterEndpoint: Registering   block    manager localhost:50238 with 944.7 MB RAM,BlockManagerId(driver, localhost, 50238)
16/03/28 21:24:52 INFO BlockManagerMaster: Registered BlockManager
16/03/28 21:24:53 INFO NettyUtil: Did not find Netty's native epoll transport in the classpath, defaulting to NIO.
16/03/28 21:24:53 INFO Cluster: New Cassandra host localhost/127.0.0.1:9042 added
16/03/28 21:24:53 INFO CassandraConnector: Connected to Cassandra cluster: Test Cluster

It just stops here for some time and then times out with the foll exception :

Exception in thread "main" com.datastax.driver.core.exceptions.UnavailableException: Not enough replicas available for query at consistency LOCAL_QUORUM (2 required but only 1 alive)
  1. What am I doing wrong ?

  2. Please let me know what are the required jars or whether there is some version compatibility issues.

  3. What is the most stable versions of spark(1.5) and cassandra (?)

Thanks in Advance


Solution

  • The problem occurs due the conflicts between the google's guava libraries.

    Solution is to shade the guava library present in the spark-cassandra-connector dependency. You can do so by using the maven shade plugin. Here is my pom.xml to shade off the guava library.

    <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://maven.apache.org/POM/4.0.0  
    http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    
    <groupId>com.pc.test</groupId>
    <artifactId>casparktest</artifactId>
    <version>0.0.1-SNAPSHOT</version>
    <packaging>jar</packaging>
    
     <name>casparktest</name>
    <url>http://maven.apache.org</url>
    
    <properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    </properties>
    
    <dependencies>
    <dependency>
     <groupId>org.apache.spark</groupId>
    <artifactId>spark-core_2.10</artifactId>
    <version>1.5.0</version>
    </dependency>
    <dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
      <version>3.8.1</version>
      <scope>test</scope>
    </dependency>
    <dependency>
        <groupId>com.datastax.spark</groupId>
        <artifactId>spark-cassandra-connector_2.10</artifactId>
        <version>1.5.0</version>
    </dependency>
    <dependency>
    <groupId>com.datastax.cassandra</groupId>
    <artifactId>cassandra-driver-core</artifactId>
    <version>3.0.0-beta1</version>
    </dependency>
    
    </dependencies>
    <build>
    
    <plugins>
        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-shade-plugin</artifactId>
            <version>2.3</version>
            <executions>
                <execution>
                    <phase>package</phase>
                    <goals>
                        <goal>shade</goal>
                    </goals>
                    <configuration>
                     <filters>
        <filter>
            <artifact>*:*</artifact>
            <excludes>
                <exclude>META-INF/*.SF</exclude>
                <exclude>META-INF/*.DSA</exclude>
                <exclude>META-INF/*.RSA</exclude>
            </excludes>
        </filter>
    </filters>
                        <relocations>
                            <relocation>
                                <pattern>com.google</pattern>
                                <shadedPattern>com.pointcross.shaded.google</shadedPattern>
                            </relocation>
    
                        </relocations>
                        <minimizeJar>false</minimizeJar>
                        <shadedArtifactAttached>true</shadedArtifactAttached>
                    </configuration>
                </execution>
            </executions>
        </plugin>
    </plugins>
    </build>
    

    After which you do a maven build which will generate a jar with all the dependency mentioned in the pom.xml also shading off the guava libraries using which you can submit spark jobs.