Is there a way to include a space before an item of Or structure only when match to one of it ? The items can repeat inside string.
REGEX:
^([A-ZÁÂÉÊÍÓÔÚ][a-záãâéêíóõôúç]+)([(e|da|das|de|do|dos)]*[\s][A-ZÁÂÉÊÍÓÔÚ][a-záãâéêíóõôúç]+)+$
Space before one of this items (mandatory): [(e|da|das|de|do|dos)]
Result I want:
Paulo César Oliveira
(this is working)
Antonio Carlos da Silva
(must have ONE space before "da")
João da Silva dos Santos e Souza
(must have ONE space before "da", "dos" and "e")
You can use
^\p{Lu}\p{Ll}+(?:(?:\s(?:e|d(?:[ao]s|[aeo])))?\s\p{Lu}\p{Ll}+)+$
^[A-ZÁÂÉÊÍÓÔÚ][a-záãâéêíóõôúç]+(?:(?:\s(?:e|d(?:[ao]s|[aeo])))?\s[A-ZÁÂÉÊÍÓÔÚ][a-záãâéêíóõôúç]+)+$
See the regex demo. \p{Lu}
and \p{Ll}
may be unsupported by your regex engine, then keep on using your character classes.
Details:
^
- start of string\p{Lu}\p{Ll}+
- an uppercase letter followed with one or more lowercase letters(?:(?:\s(?:e|d(?:[ao]s|[aeo])))?\s\p{Lu}\p{Ll}+)+
- one or more occurrences of the following patterns:
(?:\s(?:e|d(?:[ao]s|[aeo])))?
- an optional occurrence of:
\s
- a whitespaces(?:e|d(?:[ao]s|[aeo]))
- e
or d
followed with either os
/as
or a
, e
, o
\s
- a whitespaces\p{Lu}\p{Ll}+
- an uppercase letter followed with one or more lowercase letters$
- end of string.