Search code examples
javascriptregexlookbehindnegative-lookbehind

Javascript Regex - unexpected behaviour on faking lookbehind


I am trying to code a widget that collates Tweets from multiple sources as an exercise (something similar exists here, but a) the list option offered there did not load any of my lists, and b) it is a useful learning exercise!). As part of this, I wanted to write a regex which replaces a Twitter handle ('@' followed by characters) with a link to the user's Twitter page. However, I did not want false positives for, for instance, an email address in a tweet.

So, for instance, the replacement should send

Hey there @twitteruser, my email address is [email protected]

to

Hey there <a href="http://twitter.com/twitteruser">@twitteruser</a>, my email address is [email protected]

Guided by this question, which suggested that I needed some way of replicating negative look-behinds in Javascript, I wrote the following code:

tweetText = tweetText.replace(/(\S)?@([^\s,.;:]*)/ig, function($0, $1){
    return $1 ? $0 + '@' + $1 : '<a href="http://www.twitter.com/' + $0 + '">@' + $0 + '</a>'
});

However, in the cases where the final part of the ternary operator is triggered, $0 contains the '@' symbol. This was unexpected for me - since the '@' was not enclosed in parentheses, I expected $0 to match '([^\s,.;:]*)' - that is, the username of the Twitter user (after, and without, the '@'). I can get the desired functionality by using $0.substring(1), but I would like to further my understanding.

Could someone please point out what I have misunderstood? I am quite new to Regexs, and have never written them in Javascript, nor have I ever used negative look-behinds.


Solution

  • In any case, instead of trying to match an optional non-space before the @, and rejecting the match if you find one, why not just require a space (or the beginning of the string) before the @?

    tweetText = tweetText.replace(
        /(^|\s)@([^\s,.;:]*)/g,
        '$1<a href="http://www.twitter.com/$2">@$2</a>'
    );
    

    Not only is this simpler, but it's likely to be quite a bit faster too, since the regexp needs to consider much fewer potential matches.