Search code examples
algorithmmathbitstring

Project Euler #219


I'm trying to do project Euler number 219 but am failing to get a grasp of it. I'm trying to use Python which according to project Euler should be able to do it within a minute! This leads me to think that they can't possibly want me to compute each individual bitstring since that would be too slow in Python - there has to be a sub O(n) algorithm.

I have looked at a recursive solution which stores the bitstrings possible prefixes so that it can quickly pick a new bitstring and it even considers them in groups. This only works in brute-forcing values up to a bit over 10:

cost(1) = 1
cost(2) = 5
cost(3) = 11
cost(4) = 18
cost(5) = 26
cost(6) = 35
cost(7) = 44
cost(8) = 54
cost(9) = 64
cost(10)= 74
cost(11)= 85
cost(12)= 96

Past this, I am struggling to comprehend how reduce the problem. It is always possible to make a pattern that goes like:

1
01
001
0001
00001
00000

But it isn't optimal for more than 7 bitstrings. Can anyone guide me in what I should be considering?


Solution

  • Brute-force is not the right way to do it. This is one of those problems where, if you know a certain thing, it's not that hard, but if you've never heard of that thing, it's practically impossible. That thing is Huffman trees.

    [Edit] Upon further review, it seems that you can't quite build a Huffman tree on N nodes with certain frequencies, because the cost function for a string is 4*(# of 1's) + (# of 0's). If the cost function were the length of the string (or a multiple thereof), then you could create a Huffman tree.

    Any prefix-free code set can be represented as a Huffman-like binary tree, where each node has either 0 or 2 children, and the leaf nodes represent the codes. Given a tree with N nodes, we can construct a tree with N+1 nodes as follows:

    1. Choose the leaf node with the smallest cost, where the cost of a leaf node is 4*(# of 1's on path from root to leaf) + (# of 0's on path)
      • Add 2 children to that node

    Thus, if the code for the node was previously xxxx, then we remove that code from our code set (since it's no longer a leaf), and add the two codes xxxx0 and xxxx1. The total cost of the code set has now increased by

    `cost(xxxx0) + cost(xxxx1) - cost(xxxx) = cost(0) + cost(1) + cost(xxxx) = 5 + cost(xxxx)

    Hence, mincost(N+1) <= mincost(N) + 5 + cost(cheapest code in best solution for N). My theory is that that inequality should be an equality, but I haven't been able to prove it yet. For all of the values you've listed which you brute-forced, this statement is in fact an equality.

    If it is an equality, then to solve the problem what you would do is this:

    1. Start with an empty code set at a total cost of zero
      • Iterating from 1 to 109, do:
        1. Find the cheapest code in your code set
      • Split that code into two by appending 0 and 1
      • Add the cost of that code + 5 to the total cost

    If you use a priority queue, you should be able to do this in O(N log N) time. This may or may not be feasable, given the upper limit of 109.