Search code examples
javascriptregexemoticons

Implementing RegExp with \B for emoticons not matching for emoticons containing letters


I am developing a chat client for a game project and am in the process of implementing emoticons. The basic rules of where emoticons should show up in the chat is that they do not appear when they are directly next to text.

I created the Regular Expression: \B(emoticontext)\B.

Unfortunately, I am having a problem where this works perfectly fine for every emoticons except the ones that contain letters. (e.g. :D, O_o, etc.)

I am not sure how to remedy the situation.

function parseEmoticons(text) {
    var pattern;
    emoticons.forEach(function (emoticon) {
        pattern = new RegExp("\\B" + emoticon.string + "\\B", 'g');
        text = text.replace(pattern, emoticon.img);
    });
    return text;
}

Here is a part of the emoticons array, for context.

  { 'string': ':\\)', 'img': '<img src="' + imgpath + 'emoticons/smile.png" class="emoticon"/>' },
    { 'string': ':O', 'img': '<img src="' + imgpath + 'emoticons/surprised.png" class="emoticon"/>' },
    { 'string': ':D', 'img': '<img src="' + imgpath + 'emoticons/happy.png" class="emoticon"/>' },

Solution

  • do not appear when they are directly next to text

    That sounds more like you want to check for surrounding whitespace, not \B (a "non-word-boundary").

    That is:

    var pattern = new RegExp('(^|\\s)' + emoticon.string.replace(/\W/g, '\\$&') + '(?!\\S)', 'g');
    text = text.replace(pattern, function (m0, m1) { return m1 + emoticon.img; });
    

    Points of note:

    • (^|\s) checks for (and captures) beginning of string or a whitespace character
    • .replace(/\W/g, '\\$&') escapes all potential regex meta-characters in the emoticon (this means you'll probably have to change ':\\)' to ':)' in your emoticon list)
    • (?!\S) ("not followed by a non-space character") makes sure the emoticon is either followed by whitespace or end-of-string (we can't use the same trick at the beginning because JavaScript doesn't support look-behind)
    • since we've potentially captured a space character at the beginning, we have to substitute it back in along with our HTML code
    • we could do that with .replace(pattern, '$1' + emoticon.img) but that will cause problems if emoticon.img ends up containing one of the special $ patterns that .replace understands and interprets
    • instead we go with a replacement function, which gets the whole matched string and the capture groups (and some other stuff) as arguments (but we only care about the first capture group)