Simpler Alternatives to Fenwick Trees

A Fenwick tree or "binary indexed tree" is a type of implicit data structure that can efficiently track prefix sums of an array as the values in the array change dynamically.

It is simple-ish and compact, and provides quite fast O(log n) updates (change a value in the underlying array) and queries (get the sum of the first k underlying elements). That makes it useful for a fair number of algorithms that I answer questions about on Stack Overflow.

However, I never provide Fenwick tree code in Stack Overflow answers, and I never write Fenwick trees in real life, because I would have to look up how to do it and/or think a lot about the implementation. There are a variety of simpler data structures with similar performance that I have used in practice.

Are there similarly simple alternative data structures that you use in practice? If so, what are they?

I am self-answering this question with my currently favorite alternative data structure, but I'm also interested in hearing about yours. I'll upvote and compliment any decent answers with code (because if you can't provide code then it isn't simple enough), and eventually accept whichever one I like best. It would be very cool if the one I like best isn't mine.

The requirements are for a simpler and similarly efficient data structure that would support the following operations:

Create an instance to track prefix sums in an N-element array, with all elements initially 0. This should be compact and take O(N) time and space.
update(i, x, y): track an update in that N-element array, where the value at index i is changed from x to y, where 0 <= i < N. This should take a quick O(log N) time and ideally require no allocations.
query(i): return the sum of the first i array elements, where 0 <= i <= N. This should also take a quick O(log N) time.

Note that my purpose in asking this question is so that I can link to the answer in other Stack Overflow answers, and actually provide code in those answers, instead of linking to a Fenwick tree and not providing code.

Solution

Fenwick Tree (but forget about the tree)

It turns out that Fenwick trees are much easier to understand and prove if you don't think of them as trees. It's so simple that I'll probably just code Fenwick trees from now on (thanks @templatetypedef). I'm a little disappointed that this answer still turned out longer than the other one, though, because of the need to explain the bit hacks.

Partial Sum Array

Given a subject array A of length N, that we want to track prefix sums of, a Fenwick tree is just an array F of N+1 partial sums, such that:

F[0] = 0. As an optimization you can skip storing this.
F[i] = sum_of_all(A[j]), where p(i) <= j < i

...where p(i) is the "prefix" of i formed by turning off its lowest 1 bit.

The sum of the first i elements of A is then just F[i] + F[p(i)] + F[p(p(i))] ... until you get to F[0].

This would actually work for any definition of p(i) that decreases to 0, but Fenwick's definition is special because it enables quick updates.

Updates

To update the array, we can just follow the above definition. If an element in the subject array changes like A[j] += x, then we need to update all the partial sums F[i] such that p(i) <= j < i.

For each 0 bit in j, there is exactly one pair of matching i and p(i). If j is xxx0yyy in binary, then i = xxx1000 works, and we have xxx0000 <= xxx0yyy < xxx1000.

So to find all the F[i] to update, we just need to find the 0 bits in j. If we didn't have a faster way, we could write the update like this:

for (bit = 1; bit <= N; bit <<= 1) {
    if (!(j&bit)) {
        i = (j & ~(bit-1)) | bit;
        F[i] += x;
    }
}

Bit Hacks

Fenwick tree implementations rely on a few bit hacks that simplify the code.

First, to calculate p(i), we can use p(i) = i & (i-1). This turns off the lowest bit 1 of i, because (i-1) flips the low-order 0 bits in i, as well as the lowest 1 bit due to borrow.

To find the 0 bits in j to update, two similar bit-hacks are used:

i = j+1 creates the first valid i, because it flips the low-order 1 bits to 0, and then sets the first 0 bit to 1 via carry.
i += (i & -i) finds next valid i by doing the same thing, but starting at is lowest 1 bit.

Putting it all together, the implementation is simple. This version, in python, is optimized to avoid storing F[0] by subtracting 1 from all the indexes in F:

def createFT(N):
    return [0]*N

def updateFT(F, index, delta):
    index += 1
    while index <= len(F):
        F[index-1] += delta
        index += index&-index; 

def queryFT(F, n):
    sum = 0
    while n > 0:
        sum += F[n-1]
        n &= n-1
    return sum

# test
test = [3, 5, 2, 7, 8, 1, 4]
ft = createFT(len(test))
for i in range(len(test)):
    updateFT(ft, i, test[i])
print("array: ", test)
print("tree: ", ft)
print("sums: ", [queryFT(ft, i) for i in range(len(test)+1)])