Search code examples
cmathcombinationscracking

Number of possible sequences of 16 symbols are there given some restrictions


How many possible sequences can be formed that obey the following rules:

  1. Each sequence is formed from the symbols 0-9a-f.

  2. Each sequence is exactly 16 symbols long.

    0123456789abcdef    ok
    0123456789abcde     XXX
    0123456789abcdeff   XXX
    
  3. Symbols may be repeated, but no more than 4 times.

    00abcdefabcdef00    ok
    00abcde0abcdef00    XXX
    
  4. A symbol may not appear three times in a row.

    00abcdefabcdef12    ok
    000bcdefabcdef12    XXX
    
  5. There can be at most two pairs.

    00abcdefabcdef11    ok
    00abcde88edcba11    XXX
    

Also, how long would it take to generate all of them?


Solution

  • In combinatorics, counting is usually pretty straight-forward, and can be accomplished much more rapidly than exhaustive generation of each alternative (or worse, exhaustive generative of a superset of possibilities, in order to filter them). One common technique is to reduce a given problem to combinatons of a small(ish) number of disjoint sub-problems, where it is possible to see how many times each subproblem contributes to the total. This sort of analysis can often result in dynamic programming solutions, or, as below, in a memoised recursive solution.

    Because combinatoric results are usually enormous numbers, brute-force generation of every possibility, even if it can be done extremely rapidly for each sequence, is impractical in all but the most trivial of cases. For this particular question, for example, I made a rough back-of-the-envelope estimate in a comment (since deleted):

    There are 18446744073709551616 possible 64-bit (16 hex-digit) numbers, which is a very large number, about 18 billion billion. So if I could generate and test one of them per second, it would take me 18 billion seconds, or about 571 years. So with access to a cluster of 1000 96-core servers, I could do it all in about 54 hours, just a bit over two days. Amazon will sell me one 96-core server for just under a dollar an hour (spot prices), so a thousand of them for 54 hours would cost a little under 50,000 US dollars. Perhaps that's within reason. (But that's just to generate.)

    Undoubtedly, the original question has is part of an exploration of the possibility of trying every possible sequence by way of cracking a password, and it's not really necessary to produce a precise count of the number of possible passwords to demonstrate the impracticality of that approach (or its practicality for organizations which have a budget sufficient to pay for the necessary computing resources). As the above estimate shows, a password with 64 bits of entropy is not really that secure if what it is protecting is sufficiently valuable. Take that into account when generating a password for things you treasure.

    Still, it can be interesting to compute precise combinatoric counts, if for no reason other than the intellectual challenge.

    The following is mostly a proof-of-concept; I wrote it in Python because Python offers some facilities which would have been time-consuming to reproduce and debug in C: hash tables with tuple keys and arbitrary precision integer arithmetic. It could be rewritten in C (or, more easily, in C++), and the Python code could most certainly be improved, but given that it only takes 70 milliseconds to compute the count request in the original question, the effort seems unnecessary.

    This program carefully groups the possible sequences into different partitions and caches the results in a memoisation table. For the case of sequences of length 16, as in the OP, the cache ends up with 2540 entries, which means that the core computation is only done 2540 times:

    # The basis of the categorization are symbol usage vectors, which count the
    # number of symbols used (that is, present in a prefix of the sequence)
    # `i` times, for `i` ranging from 1 to the maximum number of symbol uses
    # (4 in the case of this question). I tried to generalise the code for different
    # parameters (length of the sequence, number of distinct symbols, maximum
    # use count, maximum number of pairs). Increasing any of these parameters will,
    # of course, increase the number of cases that need to be checked and thus slow
    # the program down, but it seems to work for some quite large values. 
    
    # Because constantly adjusting the index was driving me crazy, I ended up
    # using 1-based indexing for the usage vectors; the element with index 0 always
    # has the value 0. This creates several inefficiencies but the practical
    # consequences are insignificant.
    
    ### Functions to manipulate usage vectors
    
    def noprev(used, prevcnt):
        """Decrements the use count of the previous symbol"""
        return used[:prevcnt] + (used[prevcnt] - 1,) + used[prevcnt + 1:]
    def bump1(used, count):
        """Registers that one symbol (with supplied count) is used once more."""
        return ( used[:count]
               + (used[count] - 1, used[count + 1] + 1)
               + used[count + 2:]
               )
    def bump2(used, count):
        """Registers that one symbol (with supplied count) is used twice more."""
        return ( used[:count]
               + (used[count] - 1, used[count + 1], used[count + 2] + 1)
               + used[count + 3:]
               )
    def add1(used):
        """Registers a new symbol used once."""
        return (0, used[1] + 1) + used[2:]
    def add2(used):
        """Registers a new symbol used twice."""
        return (0, used[1], used[2] + 1) + used[3:]
    
    def count(NSlots, NSyms, MaxUses, MaxPairs):
        """Counts the number of sequences of length NSlots over an alphabet
           of NSyms symbols where no symbol is used more than MaxUses times,
           no symbol appears three times in a row, and there are no more than 
           MaxPairs pairs of symbols.
        """
        cache = {}
    
        # Canonical description of the problem, used as a cache key
        #   pairs: the number of pairs in the prefix
        #   prevcnt: the use count of the last symbol in the prefix
        #   used: for i in [1, NSyms], the number of symbols used i times
        #         Note: used[0] is always 0. This problem is naturally 1-based
        def helper(pairs, prevcnt, used):
            key = (pairs, prevcnt, used)
            if key not in cache:
                # avail_slots: Number of remaining slots.
                avail_slots = NSlots - sum(i * count for i, count in enumerate(used))
                if avail_slots == 0:
                    total = 1
                else:
                    # avail_syms:  Number of unused symbols.
                    avail_syms = NSyms - sum(used)
                    # We can't use the previous symbol (which means we need
                    # to decrease the number of symbols with prevcnt uses).
                    adjusted = noprev(used, prevcnt)[:-1]
                    # First, add single repeats of already used symbols
                    total = sum(count * helper(pairs, i + 1, bump1(used, i))
                                for i, count in enumerate(adjusted)
                                if count)
                    # Then, a single instance of a new symbol
                    if avail_syms:
                        total += avail_syms * helper(pairs, 1, add1(used))
    
                    # If we can add pairs, add the already not-too-used symbols
                    if pairs and avail_slots > 1:
                        total += sum(count * helper(pairs - 1, i + 2, bump2(used, i))
                                     for i, count in enumerate(adjusted[:-1])
                                     if count)
                        # And a pair of a new symbol
                        if avail_syms:
                            total += avail_syms * helper(pairs - 1, 2, add2(used))
                cache[key] = total
            return cache[key]
    
        rv = helper(MaxPairs, MaxUses, (0,)*(MaxUses + 1))
        #   print("Cache size: ", len(cache))
        return rv
    
    # From the command line, run this with the command:
    # python3 SLOTS SYMBOLS USE_MAX PAIR_MAX
    # There are defaults for all four argument.
    if __name__ == "__main__":
        from sys import argv
        NSlots, NSyms, MaxUses, MaxPairs = 16, 16, 4, 2
        if len(argv) > 1: NSlots   = int(argv[1])
        if len(argv) > 2: NSyms    = int(argv[2])
        if len(argv) > 3: MaxUses  = int(argv[3])
        if len(argv) > 4: MaxPairs = int(argv[4])
        print (NSlots, NSyms, MaxUses, MaxPairs,
               count(NSlots, NSyms, MaxUses, MaxPairs))
    

    Here's the result of using this program to compute the count of all valid sequences (since a sequence longer than 64 is impossible given the constraints), taking less than 11 seconds in total:

    $ time for i in $(seq 1 65); do python3 -m count $i 16 4; done
    1 16 4 2 16
    2 16 4 2 256
    3 16 4 2 4080
    4 16 4 2 65040
    5 16 4 2 1036800
    6 16 4 2 16524000
    7 16 4 2 263239200
    8 16 4 2 4190907600
    9 16 4 2 66663777600
    10 16 4 2 1059231378240
    11 16 4 2 16807277588640
    12 16 4 2 266248909553760
    13 16 4 2 4209520662285120
    14 16 4 2 66404063202640800
    15 16 4 2 1044790948722393600
    16 16 4 2 16390235567479693920
    17 16 4 2 256273126082439298560
    18 16 4 2 3992239682632407024000
    19 16 4 2 61937222586063601795200
    20 16 4 2 956591119531904748877440
    21 16 4 2 14701107045788393912922240
    22 16 4 2 224710650516510785696509440
    23 16 4 2 3414592455661342007436384000
    24 16 4 2 51555824538229409502827923200
    25 16 4 2 773058043102197617863741843200
    26 16 4 2 11505435580713064249590793862400
    27 16 4 2 169863574496121086821681298457600
    28 16 4 2 2486228772352331019060452730124800
    29 16 4 2 36053699633157440642183732148192000
    30 16 4 2 517650511567565591598163978874476800
    31 16 4 2 7353538304042081751756339918288153600
    32 16 4 2 103277843408210067510518893242552998400
    33 16 4 2 1432943471827935940003777587852746035200
    34 16 4 2 19624658467616639408457675812975159808000
    35 16 4 2 265060115658802288611235565334010714521600
    36 16 4 2 3527358829586230228770473319879741669580800
    37 16 4 2 46204536626522631728453996238126656113459200
    38 16 4 2 595094456544732751483475986076977832633088000
    39 16 4 2 7527596027223722410480884495557694054538752000
    40 16 4 2 93402951052248340658328049006200193398898022400
    41 16 4 2 1135325942092947647158944525526875233118233702400
    42 16 4 2 13499233156243746249781875272736634831519281254400
    43 16 4 2 156762894800798673690487714464110515978059412992000
    44 16 4 2 1774908625866508837753023260462716016827409668608000
    45 16 4 2 19556269668280714729769444926596793510048970792448000
    46 16 4 2 209250137714454234944952304185555699000268936613376000
    47 16 4 2 2169234173368534856955926000562793170629056490849280000
    48 16 4 2 21730999613085754709596718971411286413365188258316288000
    49 16 4 2 209756078324313353775088590268126891517374425535395840000
    50 16 4 2 1944321975918071063760157244341119456021429461885104128000
    51 16 4 2 17242033559634684233385212588199122289377881249323872256000
    52 16 4 2 145634772367323301463634877324516598329621152347129008128000
    53 16 4 2 1165639372591494145461717861856832014651221024450263064576000
    54 16 4 2 8786993110693628054377356115257445564685015517718871715840000
    55 16 4 2 61931677369820445021334706794916410630936084274106426433536000
    56 16 4 2 404473662028342481432803610109490421866960104314699801413632000
    57 16 4 2 2420518371006088374060249179329765722052271121139667645435904000
    58 16 4 2 13083579933158945327317577444119759305888865127012932088217600000
    59 16 4 2 62671365871027968962625027691561817997506140958876900738150400000
    60 16 4 2 259105543035583039429766038662433668998456660566416258886520832000
    61 16 4 2 889428267668414961089138119575550372014240808053275769482575872000
    62 16 4 2 2382172342138755521077314116848435721862984634708789861244239872000
    63 16 4 2 4437213293644311557816587990199342976125765663655136187709235200000
    64 16 4 2 4325017367677880742663367673632369189388101830634256108595793920000
    65 16 4 2 0
    
    real    0m10.924s
    user    0m10.538s
    sys     0m0.388s