Search code examples
regexjsonschemaregex-group

Regex for plain and accentuated chars, with spaces and dashes only inside the string


I'm trying to come up with a regex to use inside a JSON schema. The constraints:

  • from 0 to 100 chars string
  • accepts letters, accentuated letters (no greek chars, but also include œ)
  • accepts space, ' and - ONLY inside the string

I have a brute force approach:

"^((?![Ð×Þß÷þø])[a-zA-ZÀ-ÿŒœ]{1})((?![Ð×Þß÷þø])[-'a-zA-ZÀ-ÿŒœ ]{0,98})((?![Ð×Þß÷þø])[a-zA-ZÀ-ÿŒœ]{1})$"

This works.. almost: the character selection matches what I want. But it won't accept strings below 2 chars. So, instead of creating three groups, is there a way to reject space, - and ' at the beginning and end of a group?

Bonus question: writing this I realize that contiguous spaces or dashes are not desirable either...


Solution

  • You can do it with negative look-arounds, i.e. a look-ahead in front of it and a look-behind after it. And we also exclude the unwanted character group everywhere:

    ^((?!.*[Ð×Þß÷þø])(?![ '-])[-'a-zA-ZÀ-ÿŒœ ]{0,100})(?<![ '-])$
    

    Demo