I'm trying to do project Euler number 219 but am failing to get a grasp of it. I'm trying to use Python which according to project Euler should be able to do it within a minute! This leads me to think that they can't possibly want me to compute each individual bitstring since that would be too slow in Python - there has to be a sub O(n) algorithm.
I have looked at a recursive solution which stores the bitstrings possible prefixes so that it can quickly pick a new bitstring and it even considers them in groups. This only works in brute-forcing values up to a bit over 10:
cost(1) = 1
cost(2) = 5
cost(3) = 11
cost(4) = 18
cost(5) = 26
cost(6) = 35
cost(7) = 44
cost(8) = 54
cost(9) = 64
cost(10)= 74
cost(11)= 85
cost(12)= 96
Past this, I am struggling to comprehend how reduce the problem. It is always possible to make a pattern that goes like:
1
01
001
0001
00001
00000
But it isn't optimal for more than 7 bitstrings. Can anyone guide me in what I should be considering?
Brute-force is not the right way to do it. This is one of those problems where, if you know a certain thing, it's not that hard, but if you've never heard of that thing, it's practically impossible. That thing is Huffman trees.
[Edit] Upon further review, it seems that you can't quite build a Huffman tree on N nodes with certain frequencies, because the cost function for a string is 4*(# of 1's) + (# of 0's). If the cost function were the length of the string (or a multiple thereof), then you could create a Huffman tree.
Any prefix-free code set can be represented as a Huffman-like binary tree, where each node has either 0 or 2 children, and the leaf nodes represent the codes. Given a tree with N nodes, we can construct a tree with N+1 nodes as follows:
Thus, if the code for the node was previously xxxx, then we remove that code from our code set (since it's no longer a leaf), and add the two codes xxxx0 and xxxx1. The total cost of the code set has now increased by
`cost(xxxx0) + cost(xxxx1) - cost(xxxx) = cost(0) + cost(1) + cost(xxxx) = 5 + cost(xxxx)
Hence, mincost(N+1) <= mincost(N) + 5 + cost(cheapest code in best solution for N). My theory is that that inequality should be an equality, but I haven't been able to prove it yet. For all of the values you've listed which you brute-forced, this statement is in fact an equality.
If it is an equality, then to solve the problem what you would do is this:
If you use a priority queue, you should be able to do this in O(N log N) time. This may or may not be feasable, given the upper limit of 109.