Search code examples
scalaapache-sparkhadooprdd

flatMap results when read from a file is different from same line passed as a string


I have just started learning spark and scala. I have a file test.txt which has one line "My name is xyz".

When I create RDD and apply flatmap method, and when I print that, I am getting -

My

name

is

xyz

But when the same line is passed as string to flatmap, it throws me a compiler error "split is not a member of char"

val lines = sc.textFile("C:/test.txt")
val result = lines.flatMap(x => x.split(" "))
result.foreach(println)

val name = "My name is xyz"
val res = name.flatMap(x => x.split(" "))
//println(res)

Solution

  • This is using sc and so is parallized in Spark.

    val lines = sc.textFile("C:/test.txt")
    val result = lines.flatMap(x => x.split(" "))
    result.foreach(println)
    

    This is not Spark'ed. Just Scala and is just a String. Next level down from String is Char.

    val name = "My name is xyz"
    val res = name.flatMap(x => x.split(" "))
    println(res)
    

    The equivalent at Scala level of the first is at least making an Array of String that approximates a line being read in by the sc.textFile, then it works or as they say Bob's your uncle:

    val name = Array("My name is xyz")
    val res = name.flatMap(x => x.split(" "))
    println(res)
    

    returns (note the ','s):

    [Ljava.lang.String;@16947521
    name: Array[String] = Array(My name is xyz)
    res: Array[String] = Array(My, name, is, xyz)