Search code examples
javascripthtmlregexreplacelookbehind

Javascript RegExp for matching text which isn't part of HTML tag


I try to find a way to highlight some text in HTML. The following HTML is given:

<div>This text contains matching words like word1 and word2 and xyzword1xyz and word2xyz and xyzword2</div>

The list of words which should be surrounded by a <span> is:

var array = ['word1','word2', 'word1word3'];

My current Javascript:

$.each(array , function(index, elem){
            if(elem.length<3 || elem === "pan" || elem === "spa" || elem === "span")return true;             
            var re = new RegExp(""+elem+"(?=([^']*'[^']*')*[^']*$)","gi");
            returnString = returnString.replace(re, "<span class='markedString colorword1orword2'>$&</span>");                
});

The resulting div would look like:

<div>This text contains matching words like <span class='markedString colorword1orword2'>word1</span> and <span class='markedString colorword1orword2'>word2</span> and xyz<span class='markedString colorword1orword2'>word1</span>xyz and <span class='markedString colorword1orword2'>word2</span>xyz and xyz<span class='markedString colorword1orword2'>word2</span> and finally <span class='markedString colorword1orword2'><span class='markedString colorword1orword2'>word1</span>word3</span></div>

Due to the current regexp everthing in the class='markedString colorword1orword2' isn't matched.

Problem: If the array would look like

var array = ['word1','word2', 'class'];

I would end up with

<div>This text contains matching words like <span <span class='markedString colorword1orword2'>class</span>='markedString colorword1orword2'>word1</span> and <span <span class='markedString colorword1orword2'>class</span>='markedString colorword1orword2'>word2</span> and xyz<span <span class='markedString colorword1orword2'>class</span>='markedString colorword1orword2'>word1</span>xyz and <span <span class='markedString colorword1orword2'>class</span>='markedString colorword1orword2'>word2</span>xyz and xyz<span <span class='markedString colorword1orword2'>class</span>='markedString colorword1orword2'>word2</span> and finally <span <span class='markedString colorword1orword2'>class</span>='markedString colorword1orword2'><span <span class='markedString colorword1orword2'>class</span>='markedString colorword1orword2'>word1</span>word3</span></div>

This example is somehow constructed, so there could be other words which might be standing in the HTML tags itself.

I need a way to simulate regexp-lookbehind so that I can make a rule like:

match everything which is not between <span and > but allow cascaded matchings like <span>adsa<span>asdsa</span></span>

Does any regexp-guru has an idea how this could be archieved?


Solution

  • You can try something like this (no looping):

    var $div = $('#the_id_of_ the_div'),
        array = ['word1','word2', 'word1word3'],
        re = new RegExp(array.join('|'), 'gi'),
        divHTML = $div.text().replace(re, "<span class='markedString colorword1orword2'>$&</span>");
    $div.html(divHTML);
    

    This is just an example, you probably get the div from some jQuery object outside the snippet in the post.


    EDIT

    If you've a bunch of divs in a wrapper, you can do something like this:

    var array = ['word1','word2', 'word1word3'],
        re = new RegExp(array.join('|'), 'gi');
    $('#wrapper div').each(function () {
        var divHTML = $(this).text().replace(re, "<span class='markedString colorword1orword2'>$&</span>");
        $(this).html(divHTML);
        return;
    });
    

    A live demo at jsFiddle.