Search code examples
pythonarcpyarcmap

Checking text pattern with search cursor and re.compile


I am working on a script that would let me check if the text in each row in the field follows specific format. I also want it to print the result for each checked record/row.

The text format looks like this:

000-00-0 NN 00

So if there is a row with typo, such as 00-00-0 NN 00 or 000 00 0-NN 0 or anything else the script should print: "feature (feature #) does not match the pattern".

Below is my script. It works with no errors, but does not give the correct result. Currently it shows all features in the field as they don't match while actually most of them match! I must be missing something very simple here...

with arcpy.da.SearchCursor(inFC, field) as rows:
for row in rows:
    if row[0] == re.compile("^([0-9]{3})[a-]([0-9]{2})[a-]([0-9]{1})[\s]([NESW]{2})[\s]([0-9]{2})*$"):
        arcpy.AddWarning("Feature {0} matches the pattern")
    else:
        arcpy.AddMessage("Feature {0} does not match the pattern")
del row, rows

Solution

  • re.compile creates a pre-compiled regular expression object. Testing it for equality is never going to work. See the documentation for Regular Expression objects here.

    You should move your re.compile call outside of your for loop, and reuse the object. Pre-compiling frequently used regex patterns can give you a significant performance boost, but you potentially lose it all by recompiling each time you go through your loop.

    Ex:

    pattern = re.compile(
        "^([0-9]{3})[a-]([0-9]{2})[a-]"
        "([0-9]{1})[\s]([NESW]{2})[\s]([0-9]{2})*$"
    )
    
    for row in rows:
        if pattern.match(row[0]):
            arcpy.AddWarning("Feature {0} matches the pattern")