Search code examples
javascriptregexecmascript-2017

Regex - find all words starting with $_ that fall anywhere between template strings


Can I solve the following with a single regex? I know it can be done with two separate regex, but I'm curious if it can be done with just one instead?

Find all instances of words (variables) that begin with $_, but only when they fall anywhere between template interpolate delimiters (<%= and %>)

So with the following text:

<div>
    <% if ( $_createDiv) { %>
      <div>Div created!</div>
    <% } %>
    <h2>
      <span><%=   $_var1   %></span>
    </h2>
    <div><%= markdown.toHTML($_var2 )  %></div>
    <div><%= $_var3 +' more text ' + $_var4 %></div>
</div>

Expected results should only be: $_var1, $_var2, $_var3, $_var4 note: $_createDiv should not be returned since it is in an "evaluate" delimiter (<% instead of <%=)

https://regex101.com/r/dAesYE/1

Is it possible to do this with a single regex, or would I need to use two? I could use two by running /(?<=<%=).*(?=%>)/gm to find all text between the delimiters, then loop through results and run /\B\$_\w+/gm to get the variables. I'm just curious if it's possible to use a single regex.

For context, I'm trying to find them so that I can run a replace to surround the variable name with a function, such as:

myFunc($_var1)

Solution

  • Assuming you can target ECMAScript 2018+ compatible environments, you can use

    /(?<=<%=(?:(?!<%=|%>).)*)\B\$_\w+(?=(?:(?!<%=|%>).)*%>)/gs
    

    See the regex demo. Otherwise, you should use your current two-step approach.

    Details:

    • (?<=<%=(?:(?!<%=|%>).)*) - a positive lookbehind that requires its pattern to match immediately to the left of the current location:
      • <%= - a substring
      • (?:(?!<%=|%>).)* - any one char, zero or more occurrences, as many as possible, that does not start a <%= or %> char sequence
    • \B\$_\w+ - a $ char preceded either with a non-word char, _, and one or more word chars
    • (?=(?:(?!<%=|%>).)*%>) - a positive lookahead that requires its pattern to match immediately to the right of the current location:
      • (?:(?!<%=|%>).)* any one char, zero or more occurrences, as many as possible, that does not start a <%= or %> char sequence
      • %> - a %> substring. See the JavaScript demo:

    const regex = /(?<=<%=(?:(?!<%=|%>).)*)\B\$_\w+(?=(?:(?!<%=|%>).)*%>)/gs;
    const text = "<div>\r\n    <% if ( $_createDiv) { %>\r\n      <div>Div created!</div>\r\n    <% } %>\r\n    <h2>\r\n      <span><%=   $_var1   %></span>\r\n    </h2>\r\n    <div><%= markdown.toHTML($_var2 )  %></div>\r\n    <div><%= $_var3 +' more text ' + $_var4 %></div>\r\n</div>";
    console.log(text.match(regex));