I am working on creating Web Components and I need a Regular Expression that captures instances of string interpolation in a template sting.
For example with the following string:
<img src="${this.image}"/><h5>${this.title}</h5><p>${this.description}</p>
The instances of string interpolation are inside ${}
and can be captured with: (this(\.\w+))
.
But I do not want to capture the first instance because it is inside an attribute.
I have tried the expression ((?<!".+)this(\.\w+)+(?!.+"))
which works with a multiline string (each tag on own line) but now on a single line.
Here is my RegExr demo.
Perhaps someone with more exp in RegEx can help me out.
To keep the question simple and to the point I didn't mention this...
The reason I want to do this is because I have am using Lit to create Web Components, I have already created an interpolator function that returns a Lit TemplateResult, now I want highlight data with <b>
tags so I want to replace RexEx matches with the unsafeHTML directive, but unsafeHTML throws an error when inside attributes.
Here is my interpolator function:
export function FillTemplate(templateString: string, data: any): TemplateResult {
let regex = /((?<!".+)this(\.\w+)+(?!.+"))/g;
if (regex.test(templateString)) {
templateString = templateString.replace(/((?<!".+)this(\.\w+)+(?!.+"))/g, "unsafeHTML($1)");
}
return new Function('html', 'unsafeHTML', "return html`"+templateString +"`;").call(data, html, unsafeHTML);
};
.... I will also give this a think, maybe it's better for me to test the object keys and not the template string...
You can use a negative lookbehind to account for a quoted attribute: ?<!=["'])\$\{this(?:\.\w+)+\}
. This will exclude the src="${this.image}"
in your example, but you'll get a false positive for HTML text, such as <p>Quote: "${this.quote}"</p>
You can use a negative lookbehind to account for a quoted attribute in an HTML tag: (?<!<\w+ (\w+=["'][^"']*["'] )*\w+=["'])\$\{this(?:\.\w+)+\}
.
Here is an example with both regexes:
const regex1 = /(?<!["'])\$\{this(?:\.\w+)+\}/g;
const regex2 = /(?<!<\w+ (\w+=["'][^"']*["'] )*\w+=["'])\$\{this(?:\.\w+)+\}/g;
[
'<img src="${this.image}"/><h5>${this.title}</h5><p>${this.description}</p><p>Quote: "${this.quote}"</p>',
'<img foo="bar" src="${this.image}"/><h5>${this.title}</h5><p>${this.description}</p><p>Quote: "${this.quote}"</p>'
].forEach(str => {
console.log(str);
console.log('- regex1:', str.match(regex1));
console.log('- regex2:', str.match(regex2));
});
Explanation of regex2
:
(?<!
-- negative lookbehind start<\w+
-- start of HTML tag and space <img
(\w+=["'][^"']*["'] )*
-- 0+ attributes of form attr="value"
, with trailing space\w+=["']
-- attribute start, such as src="
or src='
)
-- negative lookbehind end\$\{this
-- literal ${this
(?:\.\w+)+
-- non-capture group for 1+ patterns of .something
\}
-- literal }
Note: If your regex engine does not support negative lookbehind (notably Safari) you can change that to a capture group, and restore it with a .replace()