Search code examples
pythonregexregex-lookarounds

Matching consecutive digits in regex while ignoring dashes in python3 re


I'm working to advance my regex skills in python, and I've come across an interesting problem. Let's say that I'm trying to match valid credit card numbers , and on of the requirments is that it cannon have 4 or more consecutive digits. 1234-5678-9101-1213 is fine, but 1233-3345-6789-1011 is not. I currently have a regex that works for when I don't have dashes, but I want it to work in both cases, or at least in a way i can use the | to have it match on either one. Here is what I have for consecutive digits so far:

validNoConsecutive = re.compile(r'(?!([0-9])\1{4,})')

I know I could do some sort of replace '-' with '', but in an effort to make my code more versatile, it would be easier as just a regex. Here is the function for more context:

def isValid(number):
    validStart = re.compile(r'^[456]') # Starts with 4, 5, or 6
    validLength = re.compile(r'^[0-9]{16}$|^[0-9]{4}-[0-9]{4}-[0-9]{4}-[0-9]{4}$') # is 16 digits long
    validOnlyDigits = re.compile(r'^[0-9-]*$') # only digits or dashes
    validNoConsecutive = re.compile(r'(?!([0-9])\1{4,})') # no consecutives over 3
    validators = [validStart, validLength, validOnlyDigits, validNoConsecutive]
    return all([val.search(number) for val in validators])

    
list(map(print, ['Valid' if isValid(num) else 'Invalid' for num in arr]))

I looked into excluding chars and lookahead/lookbehind methods, but I can't seem to figure it out. Is there some way to perhaps ignore a character for a given regex? Thanks for the help!


Solution

  • You can add the (?!.*(\d)(?:-*\1){3}) negative lookahead after ^ (start of string) to add the restriction.

    The ^(?!.*(\d)(?:-*\1){3}) pattern matches

    • ^ - start of string
    • (?!.*(\d)(?:-*\1){3}) - a negative lookahead that fails the match if, immediately to the right of the current location, there is
      • .* - any zero or more chars other than line break chars as many as possible
      • (\d) - Group 1: one digit
      • (?:-*\1){3} - three occurrences of zero or more - chars followed with the same digit as captured in Group 1 (as \1 is an inline backreference to Group 1 value).

    See the regex demo.

    If you want to combine this pattern with others, just put the lookahead right after ^ (and in case you have other patterns before with capturing groups, you will need to adjust the \1 backreference). E.g. combining it with your second regex, validLength = re.compile(r'^[0-9]{16}$|^[0-9]{4}-[0-9]{4}-[0-9]{4}-[0-9]{4}$'), it will look like

    validLength = re.compile(r'^(?!.*(\d)(?:-*\1){3})(?:[0-9]{16}|[0-9]{4}-[0-9]{4}-[0-9]{4}-[0-9]{4})$')