Why are tombstones needed in an open addressing hashtable?

For removal, I would just keep searching until reaching the last occupied cell, then I will swap it with the value I want to delete if I encountered it earlier and clear the last cell. I'm not sure why tombstones are needed at all.

Solution

Because the same bucket can be part of the post-collision search chain for any number of hashed-to buckets. When you "clear the last cell" (bucket), all collision search chains that pass through that bucket will then see the cleared bucket and if they follow the usual hash-table logic they'll stop searching (and if they were part of another collision chain than yours, not notice other same-hashed-to-bucket content), and if they don't (follow the usual logic i.e.) stop searching there then all lookups will be less efficient as they use some weird logic to determine when to stop.

The simplest case - linear probing: you can find a cluster of elements and move them to the right places so future lookups work - it's just more proactive work that might or might not be useful later. Sometimes lazy is good (e.g. you might grow the table to the point where you rehash before the tombstones become problematically numerous). It's much harder - generally impractical - to compact if using more complicated search chains, even something fairly simple like quadratic probing.

To elaborate on the linear probing case, consider:

bucket#      bucket#-bucket's-key-hashed-to
...
 9           <clear>
10           10
11           10
12           11
13           10
14           14
15           11
16           10
17           11
18           10
19           19
20           <clear>
...

As illustrated, the cluster of ten values in buckets [10..19] have keys that hashed to buckets 10, 11, 14 or 19.

Say we want to delete the entry stored in bucket [15]. Hashing the key for it will take us to bucket [11], then we'll search linearly until we reach [15] and find a match. If we want to compact instead of suing tombstones, we must continue searching from bucket [15] to [20] before we can know we've found the last key that hashes to [11] (namely, at [17]). Then how do we delete?

Your idea was to move bucket [17] content to [15], clearing [17], but afterwards linear probing for the key in bucket [18] would search from [10..17] - see it's clear - and stop without inspecting bucket 18. Similarly, insertion of that same key would place a duplicate at 17.

It is possible to compact things correctly to maintain the normal invariants for linear probing, but it's fiddly. The "rules" are:

we can't move something before the bucket it hashes to (so [19] can't be "compacted" into an earlier bucket)
if we a bucket, all values in the rest of the cluster (here [10..19] that hash to a bucket above the bucket we clear must be stored above the bucket we clear

In this case, we could "move" [18] to [15], leaving [18] .

Now, if you think about how all that worked, a lot of the analysis was possible because we knew all relevant data was clustered between cleared buckets, in buckets [10..19]. Collision handling that's more elaborate - like quadratic probing - means the members of the collision chains around the bucket you're trying to delete from can be anywhere else in the table. The elements no longer demarcate clusters of elements that all hashed within a quickly discernable range of buckets close to where you're hashing to for your deletion operation.

Summarily, this kind of compaction technique isn't easy or fast for the simplest linear probing collision-handling option, and is impractical for most other collision-handling.