I'm trying to create a regex pattern to match account ids following certain rules. This matching will occur within a python script using the re library, but I believe the question is mostly just a regex in general issue.
The account ids adhere to the following rules:
AND
OR
So, the following would be 'valid' account ids:
ABC123
123456
12345A
1234AB
123ABC
12ABCD
1ABCDE
AAA111
And the following would be 'invalid' account ids
ABCDEF
ABCDE1
ABCD12
AB1234
A12345
ABCDEFG
1234567
1
12
123
1234
12345
I can match the 3 letters followed by 3 numbers very simply, but I'm having trouble understanding how to write a regex to varyingly match an amount of letters such that if x = number of numbers in string, then y = number of letters in string = 6 - x.
I suspect that using lookaheads might help solve this problem, but I'm still new to regex and don't have an amazing grasp on how to use them correctly.
I have the following regex right now, which uses positive lookaheads to check if the string starts with a number or letter, and applies different matching rules accordingly:
((?=^[0-9])[0-9]{1,6}[A-Z]{0,5}$)|((?=^[A-Z])[A-Z]{3}[0-9]{3}$)
This works to match the 'valid' account ids listed above, however it also matches the following strings which should be invalid:
How can I change the first capturing group ((?=^[0-9])[0-9]{1,6}[A-Z]{0,5}$)
to know how many letters to match based on how many numbers begin the string, if that's possible?
You could write the pattern as:
^(?=[A-Z\d]{6}$)(?:[A-Z]{3}\d{3}|\d+[A-Z]*)$
Explanation
^
Start of string(?=[A-Z\d]{6}$)
Positive lookahead, assert 6 chars A-Z or digits till the end of the string(?:
Non capture group for the alternatives
[A-Z]{3}\d{3}
Match 3 chars A-Z and 3 digits|
Or\d+[A-Z]*
Match 1+ digits and optional chars A-Z)
Close the non capture group$
End of string