Search code examples
apache-sparkcassandradatastaxspark-cassandra-connector

Cassandra connector -- difference between joinWithCassandraTable and leftJoinWithCassandraTable -- Cannot Resolve symbol


I am trying to access data from Cassandra by joining using the datastax cassandra connector. The below code is working for me. I am trying to sum up value columns from RDD and Cassandra after join

tm(a.joinWithCassandraTable("ks","tbl").on(SomeColumns("key","key2","key3","key4","key5","key6","key7","key8","key9","key10","key11","key12","key13","key14","key15","column1","column2","column3","column4","column5")).select("value1").map { case (ip, row) => IP(ip.key, ip.key2, ip.key3,ip.key4,ip.key5,ip.key6,ip.key7,ip.key8,ip.key9,ip.key10,ip.key11,ip.key12,ip.key13,ip.key14,ip.key15,ip.column1,ip.column2,ip.column3,ip.column4,ip.column5,ip.value1 + row.getLong("value1")) }.saveToCassandra("ks", "tbl"))

However, when I try to do a left join, it gives a "Cannot Resolve symbol getLong" I believe this is due to the fact that left join does not guarantee a value, since it could be null, but I am not able to code this in scala.

tm(a.leftJoinWithCassandraTable("ks","tbl").on(SomeColumns("key","key2","key3","key4","key5","key6","key7","key8","key9","key10","key11","key12","key13","key14","key15","column1","column2","column3","column4","column5")).select("value1").map { case (ip, row) => IP(ip.key, ip.key2, ip.key3,ip.key4,ip.key5,ip.key6,ip.key7,ip.key8,ip.key9,ip.key10,ip.key11,ip.key12,ip.key13,ip.key14,ip.key15,ip.column1,ip.column2,ip.column3,ip.column4,ip.column5,ip.value1 + row.getLong("value1")) }.saveToCassandra("ks", "tbl"))

Any help is appreciated. If there is any information that is needed, let me know and I will try to add


Solution

  • when you don't get data in Cassandra, you should get an Option[Row] instead of Row object.

    Instead of .map { case (ip, row) => ...} you can write:

    .map { case (ip, row) => 
      row match {
        case None => ip
        case Some(data) => IP(...., ip.value1 + data.getLong("value1"))
      }
    }
    

    in this case - when you don't have data (None), then you just return IP object itself, and if you have data then you construct new IP object