Search code examples
scalacassandraapache-sparkdatastax-enterprisespark-cassandra-connector

Cassandra spark connector joinWithCassandraTable on field with differents name


I'm looking to make a join on a RDD and a cassandra table which have not the same name for the same key ex (simplified):

case class User(id : String, name : String)

and

case class Home( address : String, user_id : String)

If would like to do :

rdd[Home].joinWithCassandraTable("testspark","user").on(SomeColumns("id"))

How can I precise the name of the field on which the join will be made. And I don't want to map the rdd to have only the right id because I would like to join all values after the joinWithCassandraTable.


Solution

  • You can use the "as" syntax just like in a select to change the mapping of what the joined columns are.

    An example

    sc.cassandraTable[Home]("ks","home").joinWithCassandraTable("ks","user").on(SomeColumns("id" as "user_id")).collect
    

    Will map the "id" column from the user table to the "user_id" field from the Home case class.