Search code examples
regex

how can I make a Regex to match prefix and multiple suffixes at same time?


I've encountered this problem: I need to clean some corrupted data that has the following pattern:

VALUES ( 1', NULL, 'Smith', 'Jackson',...
VALUES ( 2', 'jon@doe', 'John', 'Doe',...

Basically, I need to remove the extra ', for that I first need to match it...

So I came up with this

(?<=VALUES \(  \d+)(.*')(?=(, '[a-zA-Z])||(^NULL,$))

Unfortunately, it's not matching anything, the \d+ is specially problematic.

Any idea how I can isolate the 1', 2', ... (only the ' after the number that can have N digits) and be able to match the prefix for 'string or NULL


Solution

  • One option is to use

    \b(VALUES\s*\(\s*\d+)'(?=,\s*(?:NULL|'[a-zA-Z]))
    

    Explanation

    • \b A word boundary
    • ( Capture group 1 (to be used in the replacement keeping this part)
      • VALUES\s*\(\s*\d+ Match VALUES, ( and then match 1+ digits
    • ) Close group 1
    • ' Match the ' that you want to remove
    • (?=,\s*(?:NULL|'[a-zA-Z]) Positive lookahead to make sure that directly to the right is a comma followed by NULL or ' and a char a-z

    Regex demo

    In the replacement use group 1.

    Output

    VALUES (  1, NULL, 'Smith', 'Jackson',...
    VALUES (  2, 'jon@doe', 'John', 'Doe',...