Search code examples
pythonalgorithmcombinatoricsdiscrete-mathematics

Why does this de Bruijn code always return 0s for the last few bits


I am computing cyclic de Bruijn sequences using the following code:

import sys
if len(sys.argv) > 1:
  n = int(sys.argv[1])  # get n from the command line
else:
  n = 4
N = 2**n
count = 0

def debruijn(x):
  if x.find(x[-n:]) < len(x)-n:  # check if last n chars occur earlier in the string
    return
  if len(x) == N+n-1:
    print(x[:N], x[N:])
    return
  debruijn(x+"0")
  debruijn(x+"1")


x = "0"*n
debruijn(x)
print("sequences")

This gives:

0000100110101111 000
0000100111101011 000
0000101001101111 000
0000101001111011 000
[...]

as the output. Why is x[N:] always equal to 000? There doesn't seem to be anything in the code that would guarantee this.


Posted to https://math.stackexchange.com/questions/3339778/why-does-searching-for-a-non-cyclic-de-bruijn-sequence-always-give-you-a-cyclic as requested by @Prune


Solution

  • It's a cyclic sequence: the last (n-1) bits, by definition, match the first (n-1) bits. x[:(n-1)] == x[-(n-1):]

    Since you forced your first digits to 0000, the "last" three are 000. Try changing your initial sequence and see:

    debruijn("0110")
    

    Output:

    0110000100111101 011
    0110000101001111 011
    0110000101111010 011
    0110000111101001 011
    0110010000111101 011
    0110010100001111 011
    0110010111101000 011
    0110011110100001 011
    0110100001011110 011
    0110100001111001 011
    0110100101111000 011
    0110100111100001 011
    0110101111000010 011
    0110101111001000 011
    0110111100001010 011
    0110111100101000 011
    

    Why does it work?

    Your only check in the code is to see whether the current last-4 sequence has been used already. Why does this suffice to guarantee wrap-around success?

    All by itself, it doesn't: you could easily begin with 00001000, consuming the 1000 sequence you want at the tail. However, if you do use up that sequence early, then you won't be able to extend the partial sequence to 16 bits. It's trivial with four matching bits: the 7-bit sequence 0001000 is now a dead end. It takes a little more work to prove it out for other starts, but it's the same general principle: the only ones that make it to the full 16 bits are forced to have saved needed wrap-around sequences.

    Look at the 0110 case to see the range of possibilities: the various solutions employ both 1-bit ends, all four 2-bit ends, and six of the eight 3-bit ends (100 and its complement, 011 won't work because of their combinatorial overlap with 0110).