Farsi regular expression in Python

I have this Python regular expression code in Python 3 that I do not understand. I appreciate any help to explain what exactly it does with a few examples. The code is this:

# encoding=utf-8
import re
newline = re.sub(r'\s+(((زا(ی)?)?|ام?|ات|اش|ای?(د)?|ایم?|اند?)[\.\!\?\،]*)', r'\1 ', newline)

Solution

here is your regular expression:

\s+(((زا(ی)?)?|ام?|ات|اش|ای?(د)?|ایم?|اند?)[\.\!\?\،]*)

and here is a visualization:

Regular expression visualization

Debuggex Demo

Your replacement is r'\1 ' which means replace what you found with the 1st group followed by space. I don't read farsi, but here is another example:

\s+((a|b)[./?]*)

Regular expression visualization

Debuggex Demo

so let's execute some code:

>>> newline = '     a?    b?        a.'
>>> re.sub('\s+((a|b)[./?]*)', r'\1 ', newline)
'a? b? a. '

This eats extra spaces preceding a particular group of characters (the leading \s+) and changes it to the identified group 1 followed by one space (r'\1 ').