Search code examples
pythonregexpython-re

Match a number of letters based on how many numbers are matched using Regex


I'm trying to create a regex pattern to match account ids following certain rules. This matching will occur within a python script using the re library, but I believe the question is mostly just a regex in general issue.

The account ids adhere to the following rules:

  1. Must be exactly 6 characters long
  2. The letters and numbers do not have to be unique

AND

  1. 3 uppercase letters followed by 3 numbers

OR

  1. Up to 6 numbers followed by an amount of letters that bring the length of the id to 6

So, the following would be 'valid' account ids:

ABC123
123456
12345A
1234AB
123ABC
12ABCD
1ABCDE
AAA111

And the following would be 'invalid' account ids

ABCDEF
ABCDE1
ABCD12
AB1234
A12345
ABCDEFG
1234567
1
12
123
1234
12345

I can match the 3 letters followed by 3 numbers very simply, but I'm having trouble understanding how to write a regex to varyingly match an amount of letters such that if x = number of numbers in string, then y = number of letters in string = 6 - x.

I suspect that using lookaheads might help solve this problem, but I'm still new to regex and don't have an amazing grasp on how to use them correctly.

I have the following regex right now, which uses positive lookaheads to check if the string starts with a number or letter, and applies different matching rules accordingly:

((?=^[0-9])[0-9]{1,6}[A-Z]{0,5}$)|((?=^[A-Z])[A-Z]{3}[0-9]{3}$)

This works to match the 'valid' account ids listed above, however it also matches the following strings which should be invalid:

  • 1
  • 12
  • 123
  • 1234
  • 12345

How can I change the first capturing group ((?=^[0-9])[0-9]{1,6}[A-Z]{0,5}$) to know how many letters to match based on how many numbers begin the string, if that's possible?


Solution

  • You could write the pattern as:

    ^(?=[A-Z\d]{6}$)(?:[A-Z]{3}\d{3}|\d+[A-Z]*)$
    

    Explanation

    • ^ Start of string
    • (?=[A-Z\d]{6}$) Positive lookahead, assert 6 chars A-Z or digits till the end of the string
    • (?: Non capture group for the alternatives
      • [A-Z]{3}\d{3} Match 3 chars A-Z and 3 digits
      • | Or
      • \d+[A-Z]* Match 1+ digits and optional chars A-Z
    • ) Close the non capture group
    • $ End of string

    Regex demo