Search code examples
regexstring-matchingdigit

Regex Counting By 3s


I'm teaching myself regular expressions, and found a quizzing site that has been helping me find more applications for them and has been helping me expand my knowledge of how they work.

I found a question asking me to form a regex to match 10 digit numbers that are multiples of 3s. The only way I can think of doing this is by having the regex recognise numbers' values and be able to manipulate them mathematically. How is this possible?

In other words, what regex would match

0003
0006
0351
1749

but not match

0005
0011
0361
4372

Solution

  • First you need to start with the rule that a number is divisble by three if and only if the sum of its digits is divisible by three (proving this takes a little number theory, but it helps to see that 9, 99, 999 etc. are all multiples of three and therefore 1, 10, 100, 1000, etc. all contribute the same amount to the remainder of a number when divided by three).

    Then, notice that there are three kinds of digits:

    • The multiples of three: 0, 3, 6, and 9. These are equivalent to 0 (mod 3).
    • One more than multiples of three: 1, 4, and 7. These are equivalent to 1 (mod 3).
    • Two more than multiples of three: 2, 5, and 8. These are equivalent to -1 (mod 3). Conventionally this class is named 2, but -1 is more useful to us. Because 1 + 2 = 0 (mod 3), -1 is a legitimate name for 2.

    Then, the numbers 0, 3, 6, and 9 are multiples of three. If we add any number of digits from class 0 anywhere, the number remains a multiple of three (so 33, 999, and 963 are all multiples of three). If we add a digit from class 1 anywhere, we need to either add another digit of class -1, or add two more digits of class 1 to bring the remainder back to 0. Likewise if we add a digit from class -1 anywhere, we either need to add another digit of class 1, or add two more digits of class -1 to bring the remainder back to 0.

    Here's wcp's answer, formatted as a perl /x regex for readability:

    /
    (   [0369] # 0
      | [147] [0369]* [258] # 1 + 0 + -1 = 0
      | (   [258] # -1
          | [147] [0369]* [147] # 1 + 0 + 1 = -1
        ) # -1
        (   [0369] # 0
          | [258] [0369]* [147] # -1 + 0 + 1 = 0
        )* # 0
        (   [147] # 1
          | [258] [0369]* [258] # -1 + 0 + -1 = 1
        ) # 1 ... -1 + 0 + 1 = 0
    )+
    /x
    

    The regex matches groups of digits having remainder of 0. The first branch matches only digits of class 0; the second branch matches groups where the remainder goes up to 1 and then back to 0; and the third branch matches groups where the remainder goes down to -1 and then back to 0. There's some cleverness in how it's constructed (a less-golfed regex would have five major branches instead of three, I think), but the comments should be enough to let you follow it.