Search code examples
javascriptregexnegative-lookbehind

How to match all strings with a specific REGEX pattern that do not start with a defined character without using a negative lookbehind


I currently have a usecase on which I want to match all http:// and https:// strings in a text but only when they do not start with a " or ' using JavaScript. If they start with another character, e.g., a whitespace, I still only want to match the http:// or https:// without the preceding character.

My current regex uses a negative lookbehind but I just realized that this is not supported in Safari:

/(?<!["'])(https?:\/\/)/gm

So what would be an alternative for using a negative lookbehind to match the following strings in a text:

  • http:// -> should match http://
  • https:// -> should match https://
  • xhttps:// -> should match https:// whereby x can be any character except " and '
  • "https:// -> should NOT match at all

Solution

  • No need of lookbebind here, use character class and groups:

    const vars = ['http://', 'https://', 'xhttps://', '"https://']
    const re = /(?:[^'"]|^)(https?:\/\/)/
    vars.forEach(x => 
       console.log(x, '- >', (x.match(re) || ['',''])[1])
    )

    Regex:

    (?:[^'"]|^)(https?:\/\/)
    

    EXPLANATION

    --------------------------------------------------------------------------------
      (?:                      group, but do not capture:
    --------------------------------------------------------------------------------
        [^'"]                    any character except: ''', '"'
    --------------------------------------------------------------------------------
       |                        OR
    --------------------------------------------------------------------------------
        ^                        the beginning of the string
    --------------------------------------------------------------------------------
      )                        end of grouping
    --------------------------------------------------------------------------------
      (                        group and capture to \1:
    --------------------------------------------------------------------------------
        http                     'http'
    --------------------------------------------------------------------------------
        s?                       's' (optional (matching the most amount
                                 possible))
    --------------------------------------------------------------------------------
        :                        ':'
    --------------------------------------------------------------------------------
        \/                       '/'
    --------------------------------------------------------------------------------
        \/                       '/'
    --------------------------------------------------------------------------------
      )                        end of \1