Search code examples
scalaapache-sparkapache-spark-sqlline-numbers

find line number in an unstructured file in scala


Hi guys I am parsing an unstructured file for some key words but i can't seem to easily find the line number of what the results I am getiing

val filePath:String = "myfile"
val myfile = sc.textFile(filePath);
var ora_temp = myfile.filter(line => line.contains("MyPattern")).collect
ora_temp.length

However, I not only want to find the lines that contains MyPatterns but I want more like a tupple (Mypattern line, line number)

Thanks in advance,


Solution

  • You can use ZipWithIndex as eliasah pointed out in a comment (with probably the most succinct way to do this using the direct tuple accessor syntax), or like so using pattern matching in the filter:

    val matchingLineAndLineNumberTuples = sc.textFile("myfile").zipWithIndex().filter({
      case (line, lineNumber) => line.contains("MyPattern")
    }).collect