Search code examples
pythonregexpersian

Add space between Persian numeric and letter with python re


I want to add space between Persian number and Persian letter like this:

"سعید123" convert to "سعید 123"

Java code of this procedure is like below.

str.replaceAll("(?<=\\p{IsDigit})(?=\\p{IsAlphabetic})", " ").

But I can't find any python solution.


Solution

  • There is a short regex which you may rely on to match boundary between letters and digits (in any language):

    \d(?=[^_\d\W])|[^_\d\W](?=\d)
    

    Live demo

    Breakdown:

    • \d Match a digit
    • (?=[^_\d\W]) Preceding a letter from a language
    • | Or
    • [^_\d\W] Match a letter from a language
    • (?=\d) Preceding a digit

    Python:

    re.sub(r'\d(?![_\d\W])|[^_\d\W](?!\D)', r'\g<0> ', str, flags = re.UNICODE)
    

    But according to this answer, this is the right way to accomplish this task:

    re.sub(r'\d(?=[آابپتثجچحخدذرزژسشصضطظعغفقکگلمنوهی])|[آابپتثجچحخدذرزژسشصضطظعغفقکگلمنوهی](?=\d)', r'\g<0> ', str,  flags = re.UNICODE)