Search code examples
javascriptregexutfnon-ascii-characters

JavaScript - match non-ascii symbols using regex


I want to match all mentioned users in comment. Example:

var comment = '@Agneš, @Petar, please take a look at this';
var mentionedUsers = comment.match(/@\w+/g);

console.log(mentionedUsers)

I'm expecting ["@Agneš", "@Petar"] but getting ["@Agne", "@Petar"]. As you can see š symbol is not matched.

How can I match all letter symbols include non-ascii?


Solution

  • Until ES6 support for unicode in regex is implemented, you can work around it with somehting like:

    /@[^\s,]+/g
    

    where you just list stuff that can't be in usernames. Next year,

    /@\w+/gu
    

    A way to make sure you don't get halves of email adresses and other cases where the @ is in the middle of a word would be to match(/[^\s,@]*@[^\s,@]+(?=[\s,]|$)/g) and then filter the results on whether they start with "@".