Search code examples
scalacassandraapache-sparkcqldatastax-enterprise

Importing a text file into Cassandra using Spark when there are multiple variable types


I'm using Spark to import data from text files into CQL tables (on DataStax). I've done this successfully with one file in which all variables were strings. I first created the table using CQL, then in the Spark shell using Scala ran:

val file = sc.textFile("file:///home/pr.txt").map(line => line.split("\\|").map(_.toString));
file.map(line => (line(0), line(1))).saveToCassandra("ks", "ks_pr", Seq("proc_c", "proc_d"));

The rest of the files I want to import contain multiple variable types. I've set up the tables using CQL and specified the appropriate types there, but how do I transform them when importing the text file in spark?


Solution

  • For example if proc_c is Int and proc_d is Double you can do it this way:

    file.map{
       line => (line(0), line(1)).
               map({ case (l, r) => (l.toInt, r.toDouble) }).
               saveToCassandra("ks", "ks_pr", Seq("proc_c", "proc_d")
    }