python recursion graph-theory discrete-mathematics induction

Getting the number of unique states in a graph

I've been trying to solve MIT's Mathematics for Computer Science problem sets and here's one of the problems:

Each monk entering the Temple of Forever is given a bowl with 15 red beads and 12 green beads. Each time the Gong of Time rings, a monk must do one of two things:

Exchange: If he has 3 red beads in his bowl, then he may exchange 3 red beads for 2 green beads.

Swap: He may replace each green bead in his bowl with a red bead and replace each red bead in his bowl with a green bead. That is, if he starts with i red beads and j green beads, then after he performs this operation, he will have j red beads and i green beads.

A monk may leave the Temple of Forever only when he has exactly 5 red beads and 5 green beads in his bowl.

There are sub-problems to this:

Prove that no one ever leaves the temple. This I proved using mathematical induction.
Prove that there are only a finite number of states that can be reached by this problem. This proof was also done by using mathematical induction and proving that the sum of number of red beads and green beads can only ever reduce or stay the same.
(This is where I am stuck) What is the true maximal number of unique states a monk can visit in any execution of the Temple of Forever machine?

After spending a considerable amount of time trying to think of sub-problem #3, I gave up and decided to write a program to count the number of unique states.

class Node:
    def __init__(self, r, g):
        self.r = r
        self.g = g

    def swap(self):
        if self.g <= 0 or self.r <= 0:
            return None
        return Node(self.g, self.r)

    def exchange(self):
        if self.r >= 3:
            return Node(self.r - 3, self.g + 2)
        return None

    def __hash__(self):
        return hash((self.r, self.g))

    def __eq__(self, other):
        if self is None and other is None:
            return True
        if self is None or other is None:
            return False
        return self.r == other.r and self.g == other.g

    def __repr__(self):
        return "({}, {})".format(self.r, self.g)

start = Node(15, 12)
queue = []
graph = []
visited = set()
queue.append(start)

while (queue):
    curr = queue.pop(0)
    visited.add(curr)
    graph.append(curr)
    s = curr.swap()
    e = curr.exchange()

    for el in [s, e]:
        if el != None:
            if el not in visited:
                queue.append(el)

print(visited, len(visited))

The answer I get from my program is

{(6, 9), (16, 9), (0, 7), (2, 5), (8, 5), (5, 8), (10, 8), (10, 7), (16, 3), (5, 18), (0, 17), (14, 1), (8, 15), (10, 13), (4, 16), (9, 16), (7, 5), (14, 2), (13, 10), (3, 1), (6, 13), (20, 3), (3, 11), (4, 12), (10, 3), (6, 14), (7, 15), (18, 5), (3, 6), (8, 6), (4, 1), (9, 7), (6, 4), (11, 4), (16, 4), (5, 17), (11, 9), (0, 18), (14, 6), (13, 6), (19, 2), (18, 6), (1, 19), (15, 7), (0, 8), (4, 11), (3, 5), (4, 6), (9, 2), (5, 7), (4, 17), (11, 3), (7, 4), (14, 12), (12, 4), (19, 1), (3, 15), (1, 3), (5, 13), (3, 21), (11, 14), (12, 9), (18, 1), (15, 12), (2, 19), (3, 10), (1, 14), (8, 10), (9, 11), (3, 16), (8, 16), (11, 13), (0, 22), (17, 5), (6, 18), (7, 14), (12, 14), (19, 6), (15, 3), (2, 20), (1, 4), (0, 12), (1, 9), (4, 2), (2, 14), (9, 6), (5, 3), (6, 8), (11, 8), (16, 8), (14, 7), (13, 5), (1, 18), (2, 4), (9, 12), (4, 7), (9, 1), (12, 5), (15, 8), (0, 3), (2, 9), (8, 1), (5, 12), (3, 20), (10, 12), (6, 3), (9, 17), (7, 10), (12, 10), (13, 11), (1, 13), (8, 11), (2, 10), (0, 23), (17, 4), (6, 19), (14, 11), (12, 15), (7, 9), (13, 1), (17, 9), (15, 2), (20, 2), (0, 13), (21, 3), (1, 8), (2, 15), (5, 2), (10, 2)} 129

So, 129. But when I look at the solution of the problem set (for sub-problem #3), here is what it states

Each move in the sequence must be either an exchange or swap, since these are the only legal moves. Now, whenever the monk performs an exchange operation, the sum r + g decreases by one.

(r - 3) + (g + 2) = (r + g) - 1

In contrast, swaps do not have any effect on the sum. Furthermore, we know that the sum r + g must be at least 3 to perform an exchange operation. Therefore, there can be at most 25 exchange operations in the sequence.

Now consider swap operations: between each pair of exchanges in the sequence, there may be an unlimited number of swaps. However, only a single swap can take the monk to a new state: if at step k the monk is in state (r, g), then at step k + 2, he will return to the same state. Therefore, an upper bound on the number of unique states in any execution of the machine is 25 + 26 + 1 = 52 (if swaps are inserted at both the beginning and end of the sequence).

Where did my program go wrong? Is my understanding of the problem statement incorrect (wrt the program I've written)? Also, I don't really understand the solution they've given. Is there a better way to explain it? For example, one of the issues/things I don't understand about it is, the solution states that the sum of beads reduces by 1 with every exchange operation. And therefore we can get 25 new states with exchange operations. But every sum at each level of the graph can be expressed in multiple ways, yes? So there must be more states created from exchange operations? Here's a link to the full problem set and it's solution.

Solution

Edit: In the light of OP's own answer and our discussion in the comments, this was the key issue:

We have to distinguish between two different numbers:

The maximum number n of visited states in any one path the monk can take;
The overall number N of states the monk could reach.

Note that N is the cardinality of the union (taken over all possible paths) of the sets of states visited in any one path. This implies n <= N, and it's easy to see that these numbers are not equal. The MIT question asked about n, whereas OP's original code was designed to find N.

The quoted proof is correct, so "an upper bound on [n] is 25 + 26 + 1 = 52".

I tried a Monte Carlo approach to approximate N: Decide randomly whether to exchange or swap whenever there is a choice, repeat until the process oscillates between (2, 0) and (0, 2), and repeat all of this a huge number of times, while keeping track of all unique visited states.

However, this does not appear to be practical, because the number of possible paths is too large, so the number we get does not come close to N with any feasible number of iterations. The following code already took about 15 minutes on my machine.

import random

def swap(i, j):
    i, j = j, i
    return i, j

def exchange(i, j):
    i, j = i - 3, j + 2
    return i, j

x, y = 15, 12
visited = {(x, y)}

for run in range(1_000_000_000):
    while x + y > 2:
        if x < 3:
            x, y = swap(x, y)
        else:
            coinflip = random.randint(0, 1)
            if coinflip == 0:
                x, y = swap(x, y)
            else:
                x, y = exchange(x, y)
        visited = visited.union({(x, y)})

    x, y = swap(x, y)    
    visited = visited.union({(x, y)})

print('Visited states:', visited)
print('Number of visited states:', len(visited))

Visited states: {(18, 0), (4, 7), (1, 3), (3, 0), (0, 2), (4, 12), (11, 14), (2, 5), (0, 3), (8, 5), (5, 8), (15, 12), (8, 1), (16, 3), (5, 18), (1, 14), (14, 1), (3, 16), (8, 16), (4, 1), (12, 14), (2, 20), (0, 18), (2, 10), (1, 4), (1, 19), (4, 2), (17, 4), (5, 3), (14, 11), (4, 6), (15, 2), (20, 2), (16, 8), (4, 17), (11, 3), (3, 1), (7, 4), (14, 12), (1, 8), (12, 4), (2, 0), (19, 1), (5, 2), (2, 4), (10, 2)}

Number of visited states: 46

Update: Here's a plot of the full state space. N = 140

And here's a path visiting 52 states. The green X is the starting point, and every blue circle marks a visited state. Since we know from the quoted proof that n <= 52, this proves n = 52.