Search code examples
javascriptregexoniguruma

How to match a whole word containing special characters?


I have words to match using only a single pattern. The criteria are one of the following:

  • it contains a number or an underscore at the first letter, OR

  • at least one special character (excluding underscore) within the word:

Should match

3testData
3test_Data
_testData
_test3Data
%data%
test%BIN%data
te$t&$#@daTa

Should NOT match

test_Data3

So far, I have managed to match some of them through:

[\p{^Alpha}]\S+

Except for the words where special characters are inside the word

3testData
3test_Data
_testData
_test3Data
%data%
test%BIN%data
test%BIN%data
te$t&$#@daTa


Solution

  • If lookbehinds are supported, you could use an alternation to match either starting with an underscore or a digit OR in the other case matching zero or more times not a whitespace character, at least a special character using a character class followed by matching zero or more times not a whitespace character again.

    (?<=\s|^)(?:[\d_]\S+|\S*[%@#$]\S*)(?=\s|$)

    Regex demo

    Explanation

    • (?<=\s|^) Positive lookbehind to assert what is on the left is either a whitespace character or the start of the string
    • (?: Start non capturing group
      • [\d_]\S+ Match a digit or an underscore followed by matching one or more times not a whitespace character
      • | Or
      • \S*[%@#$]\S* Match zero or more times not a whitespace character followed by matching what is specified in the character class and the match zero or more times not a whitespace character again
    • ) Close non capturing group
    • (?=\s|$) Positive lookahead to assert that what follows is a whitespace character or the end of the string