I need to add the contents of one (small) Redis Sorted Set of size K to another (possibly large) Sorted Set of size N. I can guarantee the intersection of the sets is empty.
The most straight-forward way to do so is a call to ZUNIONSTORE with both sets as sources and the large set as the destination. However, this has a documented worst-case performance on the order of O(N + K + (N+K) * log (N+K)). Meanwhile the worst-case performance of combining ZRANGE and ZADD would only be O(logK + K + KlogN). K will usually be on the order of tens of items, while N could be tens of thousands of items. Does Redis optimize ZUNIONSTORE to the same as ZRANGE + ZADD when the destination is the same as one of the sources and one set is small, or does it always add all N + K records to a new set and then throw away the old one?
Background: I'm working on a project with a priority queue. The queue is implemented as a Redis SortedSet. Multiple processes write to the queue, and periodically one thread of one process retrieves up to K items from the queue for processing.
To ensure we respond immediately to higher-priority items entering the Sorted Set after we start our query, we read the single highest-priority message K times instead of retrieving the highest K messages all at once.
Sometimes we need to return an item to the queue instead of processing it immediately. We can't add it back into the queue until we've checked all K items - otherwise if we had to skip the first item in the queue we would check that one item K items instead of checking K different items.
Currently we're copying each item we return to a separate "work list". Items that get processed are removed from both the priority queue and the work list. Items that need to be skipped are only removed from the priority queue. After we've checked K items, we copy anything left in the "work list" back to the priority queue. That's currently being done with a ZUNIONSTORE because it's the simplest option provided by Redis that accomplishes our goals.
I'm concerned about the worst-case performance of ZUNIONSTORE being worse than the ZRANGE + ZADD approach for large N. ZUNIONSTORE documents the semantics we need and avoids the network overhead of reading everything out of Redis only to immediately send it all back to Redis to be added. However, given historical data N could be much larger than K at peak usage.
If Redis can recognize this case and performs appropriately I don't need to do anything. If not, I'll have to roll my own UNION code. Can I expect ZUNIONSTORE to perform similarly to ZRANGE + ZADD when one of the sources and the destination are the same, or do I have to ask Redis to explicitly add every element of the other source to the destination to get that behaviour?
StackOverflow mandated "what you've tried" field follows
Pseudocode
Various processes add N items to the priority queue "queue"
"buffer" is initially empty
for int i = 1 to k
item = ZRANGE queue 1 1
ZADD buffer item.key item.score
ZREM queue item.key // Remove item by key in case another thread added higher priority item.
if (processItem(item))
ZREM buffer item.key
// All entries processed or skipped for future processing.
// Add skipped items in buffer back to queue.
ZUNION queue queue buffer
Alternative to last line that I am curious about:
bufferItems[] = ZRANGE buffer 0 -1
ZADD queue bufferItems[]
Does Redis optimize ZUNIONSTORE to the same as ZRANGE + ZADD when the destination is the same as one of the sources and one set is small
No. You can take a look at Redis source code here.
ZUNIONSTORE
calls zunionInterDiffGenericCommand
with an SET_OP_UNION
argument.
Next, look at zunionInterDiffGenericCommand
here.
Specifically - take a look at the SET_OP_UNION
implementation here.
The code is straightforward and you can see that no such optimization is taking place.