Search code examples
regexscalanewlinewhitespace

matching new line in Scala regex, when reading from file


For processing a file with SQL statements such as:

ALTER TABLE ONLY the_schema.the_big_table
    ADD CONSTRAINT the_schema_the_big_table_pkey PRIMARY KEY (the_id);

I am using the regex:

 val primaryKeyConstraintNameCatchingRegex: Regex = "([a-z]|_)+\\.([a-z]|_)+\n\\s*(ADD CONSTRAINT)\\s*([a-z]|_)+\\s*PRIMARY KEY”.r

Now the problem is that this regex does not return any results, despite the fact that both the regex

val alterTableRegex = “ALTER TABLE ONLY\\s+([a-z]|_)+\\.([a-z]|_)+”.r

and

val addConstraintRegex = “ADD CONSTRAINT\\s*([a-z]|_)+\\s*PRIMARY KEY”.r

match the intended sequences.

I thought the problem could be with the new line, and, so far, I have tried writing \\s+, \\W+, \\s*, \\W*, \\n*, \n*, \n+, \r+, \r*, \r\\s*, \n*\\s*, \\s*\n*\\s*, and other combinations to match the white space between the table name and add constraint to no avail.

I would appreciate any help with this.

Edit

This is the code I am using:

import scala.util.matching.Regex
import java.io.File

import scala.io.Source


object Hello extends Greeting with App {

  val primaryKeyConstraintNameCatchingRegex: Regex = "([a-z]|_)+\\.([a-z]|_)+\r\\s*(ADD CONSTRAINT)\\s*([a-z]|_)+\\s*PRIMARY KEY".r


  readFile

  def readFile: Unit = {
    val fname = "dump.sql"
    val fSource = Source.fromFile(fname)


    for (line <- fSource.getLines) {
      val matchExp = primaryKeyConstraintNameCatchingRegex.findAllIn(line).foreach(
        segment => println(segment)
      )
    }

    fSource.close()


  }
}

Edit 2

Another strange behavior is that when matching with

"""[a-z_]+(\.[a-z_]+)\s*A""”.r

the matches happen and they include A, but when I use

"""[a-z_]+(\.[a-z_]+)\s*ADD""”.r

which is only different in DD, no sequence is matched.


Solution

  • Your problem is that you read the file line by line (see for (line <- fSource.getLines) code part).

    You need to grab the contents as a single string to be able to match across line breaks.

    val fSource = Source.fromFile(fname).mkString
    val matchExps = primaryKeyConstraintNameCatchingRegex.findAllIn(fSource)
    

    Now, fSource will contain the whole text file contents as one string and matchExps will contain all found matches.