Search code examples
regexregex-group

Regex - Include space before only when one OR item is match


Is there a way to include a space before an item of Or structure only when match to one of it ? The items can repeat inside string.

REGEX:

^([A-ZÁÂÉÊÍÓÔÚ][a-záãâéêíóõôúç]+)([(e|da|das|de|do|dos)]*[\s][A-ZÁÂÉÊÍÓÔÚ][a-záãâéêíóõôúç]+)+$

Space before one of this items (mandatory): [(e|da|das|de|do|dos)]

Result I want:

Paulo César Oliveira (this is working)
Antonio Carlos da Silva (must have ONE space before "da")
João da Silva dos Santos e Souza (must have ONE space before "da", "dos" and "e")


Solution

  • You can use

    ^\p{Lu}\p{Ll}+(?:(?:\s(?:e|d(?:[ao]s|[aeo])))?\s\p{Lu}\p{Ll}+)+$
    ^[A-ZÁÂÉÊÍÓÔÚ][a-záãâéêíóõôúç]+(?:(?:\s(?:e|d(?:[ao]s|[aeo])))?\s[A-ZÁÂÉÊÍÓÔÚ][a-záãâéêíóõôúç]+)+$
    

    See the regex demo. \p{Lu} and \p{Ll} may be unsupported by your regex engine, then keep on using your character classes.

    Details:

    • ^ - start of string
    • \p{Lu}\p{Ll}+ - an uppercase letter followed with one or more lowercase letters
    • (?:(?:\s(?:e|d(?:[ao]s|[aeo])))?\s\p{Lu}\p{Ll}+)+ - one or more occurrences of the following patterns:
      • (?:\s(?:e|d(?:[ao]s|[aeo])))? - an optional occurrence of:
        • \s - a whitespaces
        • (?:e|d(?:[ao]s|[aeo])) - e or d followed with either os/as or a, e, o
      • \s - a whitespaces
      • \p{Lu}\p{Ll}+ - an uppercase letter followed with one or more lowercase letters
    • $ - end of string.