Search code examples
algorithmsearch-engineinformation-retrievalinverted-index

Union of inverted lists


Give k sorted inverted lists, I want an efficient algorithm to get the union of these k lists? Each inverted list is a read-only array in memory, each list contains integer in sorted order. the result will be saved in a predefined array which is large enough. Is there any algorithm better than k-way merge?


Solution

  • K-Way merge is optimal. It has O(log(k)*n) ops [where n is the number of elements in all lists combined].

    It is easy to see it cannot be done better - as @jpalecek mentioned, otherwise you could sort any array better then O(nlogn) by splitting it into chunks [inverted indexes] of size 1.

    • Note: This answer assumes it is important that inverted indexes [resulting array] will be sorted. This assumption is true for most applications that use inverted indexes, especially in the Information-Retrieval area. This feature [sorted indexes] allows elegant and quick intersection of indexes.
    • Note: that standard k-way merge allows duplications, you will have to make sure that if an element is appearing in two lists, it will be added only once [easy to do it by simply checking the last element in the target array before adding].