Search code examples
data-structuresrustbtreemap

Data Structure with Range Search Against Constantly Updating Keys


I have the need to store many data flows consisting of something like:

struct Flow {
    source: Address,
    destination: Address,
    last_seq_num_sent: u32,
    last_seq_num_rcvd: u32,
    last_seq_num_ackd: u32
}

I need to query by last_seq_num_rcvd. I can guarantee (with off-screen magic) the uniqueness of this field among all flows.

The flow may occur over unreliable connections, so some sequence numbers may get skipped due to network packet loss. I account for this by using a window, one which also guarantees uniqueness for its entire stretch. The rates of data flows are independent of each other, but have the ability to renumber their sequence numbers before collisions occur.

So the goal is to perform a range query against the flows to find any flow with a last_seq_num_rcvd within a WINDOW_SIZE constant's distance of some given next sequence number.

I gather the BTreeMap is appropriate here for its range query ability.

const WINDOW_SIZE = 10;
struct FlowValue { /* All original fields, minus last_seq_num_rcvd which now acts as key */ }

let mut flows = BTreeMap<u32, FlowValue>::new();

let query = 42;
for (k, v) in flows.range(Excluded(query), Included(query + WINDOW_SIZE)) {
    // This is how I would query for a flow
}

But now my key is something that changes often. It seems like there's no efficient way to update it in-place; it requires full deletion and reinsertion (under incremented key), which sounds like an expensive operation.

Is the BTreeMap method too expensive? Is there an alternative data structure that isn't? Or could I overload the BTreeMap to actually perform an efficient in-place increment of an integer key?


Solution

  • You're right that a B-Tree map is a little expensive for this application.

    Since the window size is constant, a faster implementation would be to partition the sequence numbers into buckets of size about WINDOW_SIZE/2. Then just put the flows into a hash table according to their rcvd bucket.

    To find flows for a particular packet, then, you only need to look up the 3 buckets that could possibly contain matching flows, and test each flow in the buckets. This will be faster than a B-Tree lookup.

    On update, the situation is even better, because you only need to update the hash table when an entry changes buckets, and that only happens every once every WINDOW_SIZE/2 packets.