Search code examples
regexregex-lookaroundspcre

Match quotes within string but only if a minimum character length


currently using the following regex to find/ substitute all matching quotes in a string:

(?|"([^"\n]*)"|“([^\'\n]*)”)

Text with 3 quotes:

"Caerphilly pecorino red leicester." Ricotta brie fromage lancashire hard cheese mozzarella queso queso. Feta hard cheese "bavarian" bergkase cheese strings swiss fromage frais bocconcini fondue. The big cheese lancashire fromage.

Squirty cheese rubber cheese bocconcini. "Melted cheese pepper jack fondue cheeseburger rubber cheese squirty cheese taleggio caerphilly. Cheese and wine fondue cheesy grin melted cheese halloumi goat gouda manchego." Cheeseburger babybel feta cheesy grin airedale halloumi edam rubber cheese. Chalk and cheese.

On regex101: https://regex101.com/r/A7pbL4/1

The above works as expected and identifies 3 quotes. However I'm struggling to find all matching quotes but only where the quote length is over 20 characters. This should result in only 2 of the above quotes being matched.

I tried adding {20,} but didn't have much luck.

(?|"([^"\n]*)"|“([^\'\n]*)”){20,}

Thanks for your help.


Solution

  • You can use (*SKIP)(*F) verbs:

    (?:"[^"\n]{0,19}"|“[^'\n]{0,19}”)(*SKIP)(*F)|(?|"([^"\n]*)"|“([^'\n]*)”)
    

    See the regex demo.

    Details:

    • (?:"[^"\n]{0,19}"|“[^'\n]{0,19}”)(*SKIP)(*F): match either
      • "[^"\n]{0,19}" - " + zero to 19 chars other than " and newline + "
      • | - or
      • “[^'\n]{0,19}” - + zero to 19 chars other than ' and newline +
      • (*SKIP)(*F) - once matched, fail the match and go on to search for the next match from the failure position
    • | - or
    • (?|"([^"\n]*)"|“([^'\n]*)”) - match either " + zero or more chars other than " and newline + ", or + zero or more chars other than ' and newline + capturing into Group 1 the chars between the outer quotes.