Search code examples
javascriptregexregex-lookarounds

Capture a string (from a certain point) with regex not starting with certain letters


I am in the process of writing a regex that captures everything from a certain point if the string doesn't start with certain letters.

More precisely I want to capture everything from - up until a comma, only IF this string doesn't start with pt.

en-GB should capture -GB

But if the word starts with pt I simply want to skip the capture:

pt-BR should capture nothing.

I created this regex:

-[^,]*

Which works nicely except that this also captures strings beginning with pt.

Unfortunately I can't use lookbehinds since its not supported by JS, so I tried using a negative lookahead like this:

^(?!pt).*

Problem is that this captures the entire string, and not from -. I tried replacing .* with something that starts capturing at -but I haven't been successful so far.

I am kinda new to regex so any guideance would be helpful.


Solution

  • To match pt- and any two letters at the start of the string or any two other letters, you may use

    text.match(/^(?:pt-[a-zA-Z]{2}|[a-zA-Z]{2})/)
    

    See the regex demo. Details:

    • ^ - start of string
    • (?:pt-[a-zA-Z]{2}|[a-zA-Z]{2}) - either of the two alternatives:
      • pt-[a-zA-Z]{2} - pt- and any two ASCII letters
      • | - or
      • [a-zA-Z]{2} - any two ASCII letters

    It looks like you need to use a .replace method for some reason. Then, you may use

    text.replace(/\b(?!pt-)([A-Za-z]{2})-[a-zA-Z]{2}\b/, '$1')
    

    See this regex demo. Details:

    • \b - a word boundary
    • (?!pt-) - no pt- allowed immediately to the right of the current location
    • ([A-Za-z]{2}) - Group 1: any two ASCII letters
    • - - a hyphen
    • [a-zA-Z]{2} - any two ASCII letters
    • \b - a word boundary