Given an 8-bit byte, produce the sequence and count of "ones" in the binary representation of the byte (using a minimized lookup table)

I want to use a lookup table to solve the problem. When the table is built, I'd like to be able to do the following:

# lookup "0110 1001"
byte = 0b01101001
offset = offsets[byte]
count = counts[byte]
sequence = table[offset:offset+count]

print "Sequence for {:08b}: {:d} one(s) at positions".format(byte, count), sequence
>> Sequence for 01101001: 4 one(s) at positions [0, 3, 5, 6]

My current effort is straightforward and produces the correct results, but table is not as space-efficient as it could be. The following code produces a correct table that has 1024 items.

table = []
offsets = []
counts = []
current_offset = 0

for byte in range(256):
    b = [j for j in range(8) if byte & (1 << j)]
    table += b
    offsets.append(current_offset)
    current_offset += len(b)
    counts.append(len(b))

I know a much smaller table exists since...

The sub-lists in the list [0, 1, 2, 3, 4, 5, 6, 7] cover a number of possible values for byte.

0b0000 0001: offset=0, count=1, sequence = [0]
0b0000 0011: offset=0, count=2, sequence = [0, 1]
0b0000 0111: offset=0, count=3, sequence = [0, 1, 2] ...
0b1111 1111: offset=0, count=8, sequence = [0, 1, 2, 3, 4, 5, 6, 7]
0b0000 0010: offset=1, count=1, sequence = [1]
0b0000 0110: offset=1, count=2, sequence = [1, 2] ...

What I'm looking for is a way to construct the smallest table possible (in terms of number of elements). I should also say that table, offsets, and counts all need to be lists or sequences (no fancy Python objects), because my end goal is to hard-code the tables into a microcontroller.

Edit: Verifying the answer from user2357112. table is reduced to 320 elements!

# given an 8 bit byte in binary, return a list and count of ones.
table = []
offsets = []
counts = []

for byte in range(64):
    table += [0] + [j+1 for j in range(6) if byte & (1 << j)]+ [7]

for byte in range(256):
    b = [j for j in range(8) if byte & (1 << j)]
    counts.append(len(b))
    for i in xrange(len(table)):
        # should be len(table)-len(b), but I want to throw an 
        #IndexError exception if the pattern isn't found
        try:
            if table[i:i+len(b)] == b:
                offsets.append(i)
                break
        except IndexError:
            print b, " not found in table."
        except:
            rasie
# verify
good_count = 0
bad_count = 0
for byte in range(256):
    offset = offsets[byte]
    count = counts[byte]
    sequence = table[offset:offset+count]
    check = 0
    for bit in sequence:
        check = check | (1 << bit)
    if byte != check:
        print "Error: {:08b} != {:08b}".format(byte, check), sequence
        bad_count += 1
    else:
        good_count += 1
print "Good: {:d} Bad: {:d}".format(good_count, bad_count)
>> Good: 256 Bad: 0

Solution

If bit order matters, this actually turns out to be a lot easier than it first sounds, because you can just concatenate the sequences for all 8-bit numbers with the MSB and LSB set (so all sequences starting with 0 and ending with 7).

To see that this is optimal, first, note that there is no way to overlap two increasing sequences starting with 0 and ending with 7. Thus, all such sequences must be represented separately in our table.

Then, note that these sequences cover all the other sequences we need to represent, because any increasing sequence not starting with 0 or not ending with 7 is a subsequence of a sequence that does start with 0 and end with 7.

If [3, 4, 1] is as good as [1, 3, 4] for 0b00011010, then you can compress things further, but doing so is tricky. I don't have an easy way to find the optimal solution, but I do have beam search. Here's the length-81 table beam search gave:

[0, 1, 2, 3, 4, 5, 6, 7, 0, 1, 4, 2, 5, 7, 3, 0, 1, 6, 2, 7, 4, 0, 5, 3, 1, 6, 4
, 0, 2, 5, 3, 6, 7, 1, 4, 3, 0, 2, 6, 7, 5, 1, 4, 6, 2, 3, 7, 1, 5, 0, 6, 4, 3,
7, 0, 2, 1, 5, 3, 4, 7, 2, 1, 5, 6, 3, 0, 7, 4, 6, 2, 5, 0, 7, 4, 5, 0, 1, 2, 6,
 3]

And here's the quick-and-dirty beam search I wrote up to find that:

# increase this to consume more memory in hopes of a better solution
# particularly low values may never find a solution
beam_width = 10000

# initialize the sequence to prevent wasting beam width on symmetric options
# I have no concrete proof that starting the sequence this way is optimal,
# but it's intuitively reasonable and it improved the solution quality.
initialseq = [0, 1, 2, 3, 4, 5, 6, 7]
initialset = set()
for i in range(9):
    for j in range(i, 9):
        initialset.add(frozenset(initialseq[i:j]))

seqs = [initialseq]
sets = [initialset]

while max(map(len, sets)) < 256:
    newseqs = []
    newsets = []
    for i, (seq, set_) in enumerate(zip(seqs, sets)):
        for newelem in range(8):
            newseq = seq + [newelem]
            newset = set_.copy()
            for slicestart in range(-8, 0):
                slice_ = newseq[slicestart:]
                if len(slice_) == len(set(slice_)):
                    newset.add(frozenset(slice_))
            newseqs.append(newseq)
            newsets.append(newset)
    seqs_and_sets = zip(newseqs, newsets)
    seqs_and_sets.sort(key=lambda x: len(x[1]), reverse=True)
    seqs_and_sets = seqs_and_sets[:beam_width]
    seqs, sets = zip(*seqs_and_sets)