I have this following regex:
/<(?:textarea|select)[\s\S]*?>[\s\S]*?(\{\{\{variable:(.+?)\}\}\})[\s\S]*?<\/(?:textarea|select)>|<(?:input)[\s\S]+?(value=[\s\S]+?)(\{\{\{variable:(.+?)\}\}\})[\s\S]+?>|(\{\{\{variable:(.+?)\}\}\})/im
And this (shortened) HTML document:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Test</title>
</head>
<body>
<section id="about">
<div class="container about-container">
<div class="row">
<div class="col-md-12">
{{{block:welcome-intro}}}
</div>
</div>
</div>
</section>
<section id="services">
<div class="container">
<div class="row">
<div class="col-md-12">
<p>You are using system version: {{{variable:system_version}}}</p>
<p>Your address: {{{variable:contact-email-address}}}</p>
<form action="http://k.loc/content/view/welcome" class="default-form" enctype="multipart/form-data" method="post" accept-charset="utf-8">
<input type="hidden" name="csrfkcmstoken" value="94ee71ada809b9a79d1b723c81020c78" />
<div class="row">
<div class="col-sm-12 form-error"></div>
</div>
<div class="row"><div class="col-sm-12"><fieldset id="personalinfo"><legend>Personal information</legend><div class="row"><div class="col-sm-12">
<div class="control-label">
<label for="testinput">Name<span class="form-validation-required"> * </span></label>
</div>
<div class="hint-text">Enter at least 2 characters and a maximum of 12 characters.</div><input id="testinput" name="testinput" placeholder="Enter your name here." class="input-group width-50" type="text" value="{{{variable:system_name}}} {{{variable:system_login}}}"><div class="row"><div class="col-sm-12"><div class="form-error"></div></div></div></div></div><div class="row"><div class="col-sm-12">
<div class="control-label">
<label for="testpassword">Password</label>
</div>
<div class="hint-text">Your password must be at least 12 characters long, contain 1 special character, 1 nunber, 1 lower case character and 1 upper case character.</div><input id="testpassword" name="testpassword" placeholder="Enter your password here." class="input-group width-50" type="password"><div class="row"><div class="col-sm-12"><div class="form-error"></div></div></div></div></div></fieldset></div></div><div class="row"><div class="col-sm-12"><fieldset id="bioinfo"><legend>Biographical information</legend><div class="row"><div class="col-sm-12">
<div class="control-label">
<label for="testtextarea">Biography</label>
<span class="hint-text">A minimum of 40 characters and a maximum of 255 is allowed. This hint is displayed inline.</span>
</div>
<textarea id="testtextarea" name="testtextarea" placeholder="Please enter your biography here." class="input-group-wide width-100" rows="5" cols="80">{{{variable:system_name}}}
{{{variable:system_login}}}</textarea><div class="row"><div class="col-sm-12"><div class="form-error"></div></div></div></div></div><div class="row"><div class="col-sm-12">
<div class="control-label">
<label for="testsummernote">Interests</label>
<span class="hint-text">A minimum of 40 characters is required. This hint is displayed inline.</span>
</div>
<textarea id="testsummernote" name="testsummernote" class="wysiwyg-editor" placeholder="Please enter your interests here."><p>{{{variable:system_name}}}<br></p><p>{{{variable:system_login}}}</p><p>{{{variable:activate_url}}}<br></p></textarea></div></div></fieldset></div></div><div class="row"><div class="col-sm-12"><button name="testsubmit" id="testsubmit" type="submit" class="btn primary">Submit<i class="zmdi zmdi-arrow-forward"></i></button></div></div>
</form> </div>
</div>
</div>
</section>
</body>
</html>
Parsing above HTML document to find {{{variable:whatever}}}
yields this result:
Array
(
[0] => Array
(
[0] => {{{variable:system_version}}}
[1] => {{{variable:contact-email-address}}}
[2] => <input type="hidden" name="csrfkcmstoken" value="94ee71ada809b9a79d1b723c81020c78" />
<div class="row"><div class="col-sm-12 form-error"></div></div>
<div class="row"><div class="col-sm-12"><fieldset id="personalinfo"><legend>Personal information</legend><div class="row"><div class="col-sm-12">
<div class="control-label"><label for="testinput">Name<span class="form-validation-required"> * </span></label></div>
<div class="hint-text">Enter at least 2 characters and a maximum of 12 characters.</div>
<input id="testinput" name="testinput" placeholder="Enter your name here." class="input-group width-50" type="text" value="{{{variable:system_name}}} {{{variable:system_login}}}">
[3] => <textarea id="testtextarea" name="testtextarea" placeholder="Please enter your biography here." class="input-group-wide width-100" rows="5" cols="80">{{{variable:system_name}}} {{{variable:system_login}}}</textarea>
[4] => <textarea id="testsummernote" name="testsummernote" class="wysiwyg-editor" placeholder="Please enter your interests here."><p>{{{variable:system_name}}}<br></p><p>{{{variable:system_login}}}</p><p>{{{variable:activate_url}}}<br></p></textarea>
)
)
[0]
and [1]
are correct, as they do not appear within a select/textarea/input tag.[3]
and [4]
are correct, because they are only encapsulated by one select/textarea/input tag.I am learning regexes and still do not understand all the concepts, but I am getting better, so please excuse if my terminology is wrong, but it does appear that it does a greedy match of some sort. I am expecting to only see <input id="testinput"...{{{variable:...}}}">
at index [2]
.
The end goal is to only replace these placeholders with different data if they are not inside a textarea/select/input.
Why would index [2]
match so many elements, and how can this be fixed?
It's frowned upon, yet I'm guessing that maybe this expression might be slightly closer to what you may have in mind, not so sure though:
<(?:textarea|select).*?>.*?(\{\{\{variable:(.*?)\}\}\}).*?<\/(?:textarea|select)>|<(?:input).+?(value=.*?)(\{\{\{variable:(.+?)\}\}\})?.*?>|(\{\{\{variable:(.*?)\}\}\})
It can be also improved, for instance escapings are unnecessary:
<(?:textarea|select).*?>.*?({{{variable:(.*?)}}}).*?</(?:textarea|select)>|<(?:input).+?(value=.*?)({{{variable:(.+?)}}})?.*?>|({{{variable:(.*?)}}})
Here, we'd be trying to add an optional group for our input
elements, so that it would distinguish between those with and without the existing vars.
$re = '/<(?:textarea|select).*?>.*?(\{\{\{variable:(.*?)\}\}\}).*?<\/(?:textarea|select)>|<(?:input).+?(value=.*?)(\{\{\{variable:(.+?)\}\}\})?.*?>|(\{\{\{variable:(.*?)\}\}\})/si';
$str = '<section id="services">
<div class="container">
<div class="row">
<div class="col-md-12">
<p>You are using system version: {{{variable:system_version}}}</p>
<p>Your address: {{{variable:contact-email-address}}}</p>
<form action="http://k.loc/content/view/welcome" class="default-form" enctype="multipart/form-data" method="post" accept-charset="utf-8">
<input type="hidden" name="csrfkcmstoken" value="94ee71ada809b9a79d1b723c81020c78" />
<div class="row">
<div class="col-sm-12 form-error"></div>
</div>
<div class="row"><div class="col-sm-12"><fieldset id="personalinfo"><legend>Personal information</legend><div class="row"><div class="col-sm-12">
<div class="control-label">';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
var_dump($matches);