Search code examples
javascriptregexstring-interpolation

RegEx: Detect string interpolation but not inside attribute


I am working on creating Web Components and I need a Regular Expression that captures instances of string interpolation in a template sting.
For example with the following string:

<img src="${this.image}"/><h5>${this.title}</h5><p>${this.description}</p>

The instances of string interpolation are inside ${} and can be captured with: (this(\.\w+)).
But I do not want to capture the first instance because it is inside an attribute.

I have tried the expression ((?<!".+)this(\.\w+)+(?!.+")) which works with a multiline string (each tag on own line) but now on a single line.

Here is my RegExr demo.
Perhaps someone with more exp in RegEx can help me out.

Edit

To keep the question simple and to the point I didn't mention this...

The reason I want to do this is because I have am using Lit to create Web Components, I have already created an interpolator function that returns a Lit TemplateResult, now I want highlight data with <b> tags so I want to replace RexEx matches with the unsafeHTML directive, but unsafeHTML throws an error when inside attributes.
Here is my interpolator function:

export function FillTemplate(templateString: string, data: any): TemplateResult {
    let regex = /((?<!".+)this(\.\w+)+(?!.+"))/g;
    if (regex.test(templateString)) {
        templateString = templateString.replace(/((?<!".+)this(\.\w+)+(?!.+"))/g, "unsafeHTML($1)");
    }
    return new Function('html', 'unsafeHTML', "return html`"+templateString +"`;").call(data, html, unsafeHTML);
};

.... I will also give this a think, maybe it's better for me to test the object keys and not the template string...


Solution

  • You can use a negative lookbehind to account for a quoted attribute: ?<!=["'])\$\{this(?:\.\w+)+\}. This will exclude the src="${this.image}" in your example, but you'll get a false positive for HTML text, such as <p>Quote: "${this.quote}"</p>

    You can use a negative lookbehind to account for a quoted attribute in an HTML tag: (?<!<\w+ (\w+=["'][^"']*["'] )*\w+=["'])\$\{this(?:\.\w+)+\}.

    Here is an example with both regexes:

    const regex1 = /(?<!["'])\$\{this(?:\.\w+)+\}/g;
    const regex2 = /(?<!<\w+ (\w+=["'][^"']*["'] )*\w+=["'])\$\{this(?:\.\w+)+\}/g;
    
    [
      '<img src="${this.image}"/><h5>${this.title}</h5><p>${this.description}</p><p>Quote: "${this.quote}"</p>',
      '<img foo="bar" src="${this.image}"/><h5>${this.title}</h5><p>${this.description}</p><p>Quote: "${this.quote}"</p>'
    ].forEach(str => {
      console.log(str);
      console.log('- regex1:', str.match(regex1));
      console.log('- regex2:', str.match(regex2));
    });

    Explanation of regex2:

    • (?<! -- negative lookbehind start
    • <\w+ -- start of HTML tag and space <img
    • (\w+=["'][^"']*["'] )* -- 0+ attributes of form attr="value" , with trailing space
    • \w+=["'] -- attribute start, such as src=" or src='
    • ) -- negative lookbehind end
    • \$\{this -- literal ${this
    • (?:\.\w+)+ -- non-capture group for 1+ patterns of .something
    • \} -- literal }

    Note: If your regex engine does not support negative lookbehind (notably Safari) you can change that to a capture group, and restore it with a .replace()