Search code examples
redis

redis: what's the fastest way to intersect


I've been using sinter which does intersection of unordered integer sets. Is there any faster way of doing intersection given that I wouldn't mind sorting beforehand (or performing any other preprocessing)?

EDIT: Found some info [here][1]

EDIT2: Bounty for specific answer: is zinterstore faster than sinter? Benchmarking would be cool too.


Solution

  • Fast answer

    In theory the intersection of lists has complexity O(N) where N is the cardinality of the smallest set.

    Use SET (SINTER/SINTERSTORE) if have sparsed data / should keep RAM low (O(N*M)) and use BITSET(SETBIT/BITOP) in all other cases (O(N). Like in your edit one info.

    BITSET

    The Redis BIT key operations have complexity O(N), there N is the cardinality of the smallest key. And the bitops has very best execution speed based on CPU cache (look at bitops.c sources). So this can be the absolute winner in you have not sparsed data or if memory not important for you (here is more about strings in Redis).

    ZSET vs SET (zinterstore vs sinter)

    Do not use ZSET (zinterstore) is you have a plain list of integers and want to intersect them. Sorted set in Redis is a complex structure their keys stored in ziplist or skiplist internal encodings. The last one used to store sorted scores but keys stored in other structures. In case of ZSET intersection always much complicated with comparison SET:

    ZSET intersection: O(N * K) + O(M * log(M)) worst case with N being the smallest input sorted set, K being the number of input sorted sets and M being the number of elements in the resulting sorted set.

    SET intersection: O(N * M) worst case where N is the cardinality of the smallest set and M is the number of sets. Actually math-based minimum in theory.

    SET uses dict / intset data structures to store data and in your case (unordered integer sets) intset would be used. Intset the most memory save the structure in Redis. And have the best read speed with comparison to ziplist (doubly linked list, more about this data structure internals here).