Search code examples
scalaapache-sparkrdd

How to remove header by using filter function in spark?


I want to remove header from a file. But, since the file will be split into partitions, I can't just drop the first item. So I was using a filter function to figure it out and here below is the code I am using :

val noHeaderRDD = baseRDD.filter(line=>!line.contains("REPORTDATETIME"));

and the error I am getting says "error not found value line "what could be the issue here with this code?


Solution

  • I don't think anybody answered the obvious, whereby line.contains also possible:

    val noHeaderRDD = baseRDD.filter(line => !(line contains("REPORTDATETIME")))
    

    You were nearly there, just a syntax issue, but that is significant of course!