I am in the middle of making a simple pattern matcher using regex, which can take my regex patterns, and produce a new string in the format I desire. What seemed like a simple program at first became very complex when i noticed that I had adjacent regex patterns that conflicted with eachother, and could no longer perform the operation correctly, because the newly formed string contained characters that would conflict with something I had just replace... (i know its probably a little confusing, so i'll provide an example).
var str = "I am the greatest";
var r1 = /(am)/g;
var r2 = /(i)/ig;
var newstr = str.replace(r1,"<i>$1</i>").replace(r2,"<h1>$1</h2>");
console.log(newstr);
//returns "<h1>I</h2> <<h1>i</h2>>am</<h1>i</h2>> the greatest"
I know that this is a naive example, however, it illustrates my point perfectly. What I would like to happen is for the second (and all proceeding) replacements to perform it's match on the original string, but do its replacement on the mutated string so that the newstr
var in the above example would read "<h1>I</h2> <i>am</i> the greatest"
. I've thought of using sourcemaps, to reference a map of the regexs and perform a custom replace function which references the map to perform the replacement at the correct location.... but i cant seem to get a grasp on sourcemaps enough to implement this.... any help would be appreciated.
You could come up with a sequence of characters that you'll expect to never find in the string, use that sequence to temporarily wrap the results of all your replace
ments, then strip that sequence after all replace
ments are finished.
Example, choosing the sequence to be #{...}
, you'd prepend that to all your regex patterns. Something like:
var seq = /#\{(.*?)\}/g; // our sequence -- #{...}
// Prepend (#\{(.*?)\})| to the given regex
var newExpression = function(regex) {
var splitRegex = regex.toString().split('/'),
flags = splitRegex.pop();
splitRegex.shift(); // get rid of the first blank entry from the opening '/' in the regex
return new RegExp('(' + seq.toString().slice(1, -2) + ')|' + splitRegex.join('/'), flags);
};
var r1 = newExpression(/(am)/g); // returns /(#\{(.*?)\})|(am)/g
var r2 = newExpression(/(i)/ig); // returns /(#\{(.*?)\})|(i)/ig
would do it, if you don't want to manually add (#\{.*?\})|
to the beginning of all your patterns. We do this so that we can recognize this sequence in subsequent passes and not touch it.
Next, make sure to stick #{
on the beginning of all matches and }
on the end;
str.replace(r1, '#{<i>$1</i>}')...
would accomplish that. Unfortunately, this isn't quite intelligent enough for us--we need to leave items that match our sequence (#{...}
) alone; in other words, replace them with themselves. Here's a function that'll do that for us nicely:
var replaceFunc = function(match) {
return match.match(seq)
? match
: '#{<' + this.tag + '>' + match + '</' + this.tag + '>}';
};
Then use it like:
var newStr = str.replace(r1, replaceFunc.bind({tag: 'i'}))
.replace(r2, replaceFunc.bind({tag: 'h1'}))
.replace(seq, '$1'); // strip the sequence, leaving the desired string
Of course, I understand that you won't necessarily be using HTML tags in your actual implementation, and this sequence might not be sufficient. But you should now be able to easily modify seq
, replaceFunc
, and/or the objects to which you bind replaceFunc
to suit your needs.
Here's a JSFiddle. Best of luck!