Search code examples
regexplsql

RegEx: match multi-line block by 1st and last lines


There is a large block of text (a PL/SQL DDL script).

I need to remove all CREATE TABLE blocks where table name contains a dollar sign ("$"), using Python re library.

The PL/SQL script is relatively well formatted. In particular, the first line will always start with CREATE TABLE and will also contain the name of the table (i.,e. there is no line break between CREATE TABLE and the table name); the closing round bracket of the entire CREATE TABLE ... statement will always be the on a separate line and it will always be the only character in the line.

Example:

CREATE TABLE "SCH"."SOME$TABLE"
( "NAME" VARCHAR2(100 CHAR),
"DATA1" BLOB,
"TIMESTAMP1" DATE
)

I am looking for a regular expression compatible with Python's re library that would match the above 5 lines.

I tried the following:

(?m)^CREATE TABLE.*\$.*[\s\S]^\)$

But it is not matching anything.

What am I doing wrong?


Solution

  • "... What am I doing wrong?"

    I believe your error is with the [\s\S] character group, as it is only matching a single character

    You could append a +? quantifier, to match up to the ^\)$.

    (?m)^CREATE TABLE.*\$.*[\s\S]+?^\)$