Search code examples
pythonregexpython-2.7postal-code

Imrove regex in Python (2.7) to find short / incomplete UK postcodes


I have a little function that finds full UK postcodes (e.g. DE2 7TT) in strings and returns them accordingly.

However, I'd like to change it to ALSO return postcodes it gets where there's either one or two letters and then one or two numbers (e.g. SE3, E2, SE45, E34).

i.e. it must collect BOTH forms of UK postcode (incomplete and complete).

The code is:

def pcsearch(postcode):
    if bool(re.search('(?i)[A-Z]{1,2}[0-9R][0-9A-Z]? [0-9][A-Z]{2}', postcode)):
        postcode = re.search('(?i)[A-Z]{1,2}[0-9R][0-9A-Z]? [0-9][A-Z]{2}', postcode)
        postcode = postcode.group()
        return postcode
    else:
        postcode = "na"
        return postcode

What tweaks are needed to get this to ALSO work with those shorter, incomplete, postcodes?


Solution

  • You might write the pattern using an alternation and word boundaries.

    (?i)\b(?:[A-Z]{1,2}[0-9R][0-9A-Z]? [0-9][A-Z]{2}|[A-Z]{1,2}\d{1,2})\b
    

    Regex demo

    The code could be refactored using the pattern only once by checking the match:

    import re
    
    def pcsearch(postcode):
           pattern = r"(?i)\b(?:[A-Z]{1,2}[0-9R][0-9A-Z]? [0-9][A-Z]{2}|[A-Z]{1,2}\d{1,2})\b"
           match = re.search(pattern, postcode)
           if match:
                  return match.group()
           else:
                  return  "na"
    
    strings = [
           "SE3",
           "E2",
           "SE45",
           "E34",
           "DE2 7TT",
           "E123",
           "SE222"
    ]
    
    for s in strings:
           print(pcsearch(s))
    

    Output

    SE3
    E2
    SE45
    E34
    DE2 7TT
    na
    na