Search code examples
javascriptregexutf-8regexp-replace

Regex filter for stings that accepts all letters from UTF-8 with the exception of all non-letter characters but with the exception of - (hyphens)


I would to regex filter for stings in javascript, namely a rule or a set of rules that accepts all letters from UTF-8 with the exception of all non-letter characters with the exception of - (hyphens)

For example it is ok to pass the filter:

abcd
ab-cd
müller
1248
ab99
straße
café
façade
São-Paulo
România
etc..

But not the non-letter characters like to ex.:

!"§$%&/()=?`>°^_<|#'@, etc

I tried several ways with regex but without success. Can you help me please


Solution

  • You could match letters and numbers [\p{L}\p{N}]+ with the unicode flag, and if the hyphen should not be at the start or end optionally repeat that part:

    ^[\p{L}\p{N}]+(?:-[\p{L}\p{N}]+)*$
    

    Regex demo

    const regex = /^[\p{L}\p{N}]+(?:-[\p{L}\p{N}]+)*$/gmu;
    const str = `abcd
    ab-cd
    müller
    1248
    ab99
    straße
    café
    façade
    São-Paulo
    România
    etc..
    !
    "
    §
    \$
    %
    &
    /
    (
    )
    =
    ?
    \`
    >
    °
    ^
    _
    <
    |
    #
    '
    @
    ,
    `;
    console.log(str.match(regex));