Search code examples
pythonregexpython-re

Python regex for fixed character length


I am new to regex am trying to create a simple regex. I have gotten below so far which works for the format I'm trying to test against.

import re
pattern = '^\+61-' #to ensure that the given string starts with +61
x = re.search(pattern, '+61-457999999')
print(x)

Output:

<re.Match object; span=(0, 4), match='+61-'>

What I'd like to do next is to add the character count check. So I tried to add {} to the end of my pattern. But this doesn't seem to work. I tried various combinations:

E.g. '^\+61-{1}' - seems to look for 1 number of occurrence of '+61-' at the beginning.

What would be an appropriate addition to the regex so:

  1. The starting characters are always '+61-4'
  2. The length of the given input is always 13

This sounds like a simple question but I was unable to find an exact matching answer for Python and the scenario described.


Solution

  • A general solution would be to match the length with a lookahead: (?=^.{13}$). Full example:

    >>> bool(re.search(r"(?=^.{13}$)^\+61-", '+61-457999999'))
    True
    >>> bool(re.search(r"(?=^.{13}$)^\+61-", '+62-457999999'))
    False
    >>> bool(re.search(r"(?=^.{13}$)^\+61-", '+61-4579999999'))
    False
    >>> bool(re.search(r"(?=^.{13}$)^\+61-", '+61-45799999'))
    False
    

    You could also be more precise and match the format, assuming digits after the -:

    >>> bool(re.fullmatch(r"\+61-\d{9}", '+61-457999999'))
    True
    >>> bool(re.fullmatch(r"\+61-\d{9}", '+62-457999999'))
    False
    >>> bool(re.fullmatch(r"\+61-\d{9}", '+61-4579999999'))
    False
    >>> bool(re.fullmatch(r"\+61-\d{9}", '+61-45799999'))
    False
    

    Or use .{9} if you want to match anything for the remaining 9 characters after the starting substring.

    The reason '^\+61-{1}' doesn't work is it's specifying 1 occurrence of the preceding character -. {1} is always implicit after every character so that's no different than '^\+61-'.

    As an aside, always use raw strings r"" for regex patterns in Python.

    As another aside, you're in Python so it's easy to check the string's length with len.