Search code examples
algorithmmathrandompermutationinverse

How to generate a pseudo-random involution?


For generating a pseudo-random permutation, the Knuth shuffles can be used. An involution is a self-inverse permutation and I guess, I could adapt the shuffles by forbidding touching an element multiple times. However, I'm not sure whether I could do it efficiently and whether it generates every involution equiprobably.

I'm afraid, an example is needed: On a set {0,1,2}, there are 6 permutation, out of which 4 are involutions. I'm looking for an algorithm generating one of them at random with the same probability.

A correct but very inefficient algorithm would be: Use Knuth shuffle, retry if it's no involution.


Solution

  • Let's here use a(n) as the number of involutions on a set of size n (as OEIS does). For a given set of size n and a given element in that set, the total number of involutions on that set is a(n). That element must either be unchanged by the involution or be swapped with another element. The number of involutions that leave our element fixed is a(n-1), since those are involutions on the other elements. Therefore a uniform distribution on the involutions must have a probability of a(n-1)/a(n) of keeping that element fixed. If it is to be fixed, just leave that element alone. Otherwise, choose another element that has not yet been examined by our algorithm to swap with our element. We have just decided what happens with one or two elements in the set: keep going and decide what happens with one or two elements at a time.

    To do this, we need a list of the counts of involutions for each i <= n, but that is easily done with the recursion formula

    a(i) = a(i-1) + (i-1) * a(i-2)

    (Note that this formula from OEIS also comes from my algorithm: the first term counts the involutions keeping the first element where it is, and the second term is for the elements that are swapped with it.) If you are working with involutions, this is probably important enough to break out into another function, precompute some smaller values, and cache the function's results for greater speed, as in this code:

    # Counts of involutions (self-inverse permutations) for each size
    _invo_cnts = [1, 1, 2, 4, 10, 26, 76, 232, 764, 2620, 9496, 35696, 140152]
    
    def invo_count(n):
        """Return the number of involutions of size n and cache the result."""
        for i in range(len(_invo_cnts), n+1):
            _invo_cnts.append(_invo_cnts[i-1] + (i-1) * _invo_cnts[i-2])
        return _invo_cnts[n]
    

    We also need a way to keep track of the elements that have not yet been decided, so we can efficiently choose one of those elements with uniform probability and/or mark an element as decided. We can keep them in a shrinking list, with a marker to the current end of the list. When we decide an element, we move the current element at the end of the list to replace the decided element then reduce the list. With that efficiency, the complexity of this algorithm is O(n), with one random number calculation for each element except perhaps the last. No better order complexity is possible.

    Here is code in Python 3.5.2. The code is somewhat complicated by the indirection involved through the list of undecided elements.

    from random import randrange
    
    def randinvolution(n):
        """Return a random (uniform) involution of size n."""
    
        # Set up main variables:
        # -- the result so far as a list
        involution = list(range(n))
        # -- the list of indices of unseen (not yet decided) elements.
        #    unseen[0:cntunseen] are unseen/undecided elements, in any order.
        unseen = list(range(n))
        cntunseen = n
    
        # Make an involution, progressing one or two elements at a time
        while cntunseen > 1:  # if only one element remains, it must be fixed
            # Decide whether current element (index cntunseen-1) is fixed
            if randrange(invo_count(cntunseen)) < invo_count(cntunseen - 1):
                # Leave the current element as fixed and mark it as seen
                cntunseen -= 1
            else:
                # In involution, swap current element with another not yet seen
                idxother = randrange(cntunseen - 1)
                other = unseen[idxother]
                current = unseen[cntunseen - 1]
                involution[current], involution[other] = (
                    involution[other], involution[current])
                # Mark both elements as seen by removing from start of unseen[]
                unseen[idxother] = unseen[cntunseen - 2]
                cntunseen -= 2
    
        return involution
    

    I did several tests. Here is the code I used to check for validity and uniform distribution:

    def isinvolution(p):
        """Flag if a permutation is an involution."""
        return all(p[p[i]] == i for i in range(len(p)))
    
    # test the validity and uniformness of randinvolution()
    n = 4
    cnt = 10 ** 6
    distr = {}
    for j in range(cnt):
        inv = tuple(randinvolution(n))
        assert isinvolution(inv)
        distr[inv] = distr.get(inv, 0) + 1
    print('In {} attempts, there were {} random involutions produced,'
        ' with the distribution...'.format(cnt, len(distr)))
    for x in sorted(distr):
        print(x, str(distr[x]).rjust(2 + len(str(cnt))))
    

    And the results were

    In 1000000 attempts, there were 10 random involutions produced, with the distribution...
    (0, 1, 2, 3)     99874
    (0, 1, 3, 2)    100239
    (0, 2, 1, 3)    100118
    (0, 3, 2, 1)     99192
    (1, 0, 2, 3)     99919
    (1, 0, 3, 2)    100304
    (2, 1, 0, 3)    100098
    (2, 3, 0, 1)    100211
    (3, 1, 2, 0)    100091
    (3, 2, 1, 0)     99954
    

    That looks pretty uniform to me, as do other results I checked.