I want to validate a string for e.g. name. A string without spaces. For normal Ascii a following regex would suffice "^\w+$" where ^ and $ takes the whole string into consideration. I tried to achieve the same result for unicode characters for supporting multiple languages using the \pL character class. But for some reason $ doesn't help match end of string. What am I doing wrong?
Code sample is here: https://play.golang.org/p/SPDEbWmqx0N
I copy pasted random characters from: http://www.columbia.edu/~fdc/utf8/
go version go1.12.5 darwin/amd64
package main
import (
"fmt"
"regexp"
)
func main() {
// Unicode character class
fmt.Println(regexp.MatchString(`^\pL+$`, "testuser")) // expected true
fmt.Println(regexp.MatchString(`^\pL+$`, "user with space")) // expected false
// Hindi script
fmt.Println(regexp.MatchString(`^\pL+$`, "सकता")) // expected true doesn't match end of line
// Hindi script
fmt.Println(regexp.MatchString(`^\pL+`, "सकता")) // expected true
// Chinese
fmt.Println(regexp.MatchString(`^\pL+$`, "我能")) // expected true
//French
fmt.Println(regexp.MatchString(`^\pL+$`, "ægithaleshâtifs")) // expected true
}
actual result:
true <nil>
false <nil>
false <nil>
true <nil>
true <nil>
true <nil>
expected result:
true <nil>
false <nil>
true <nil>
true <nil>
true <nil>
true <nil>
You may use
^[\p{L}\p{M}]+$
See Go demo.
Details
^
- start of string[
- start of a character class that matches
\p{L}
- any BMP letter\p{M}
- any diacritic]+
- end of the character class, repeat 1+ times$
- end of string.If you plan to also match digits and _
as \w
does, add them to the character class, ^[\p{L}\p{M}0-9_]+$
or ^[\p{L}\p{M}\p{N}_]+$
.