Search code examples
algorithmdata-structures

Most efficient algorithm to compute set difference?


What is the most efficient algorithm to compute the difference between A and B sets when both sets are sorted? How can I implement it with pseudocode?


Solution

  • Use an index in both arrays, and compare the values at those indices. If they are equal, they are not in the difference, and both indices can increment with 1. Otherwise the least value is only in one of the two sets, and belongs in the difference. In that case only the index of that value should increment with 1.

    Here is an implementation in basic JavaScript. It collects all values in one of the following result arrays:

    • onlyInA: values that only occur in A, not in B
    • onlyInB: values that only occur in B, not in A
    • inBoth: values that A and B have in common.

    It is assumed that the input arrays are sorted in ascending order.

    function difference(a, b) {
        let i = 0; j = 0;
        let onlyInA = [];
        let onlyInB = [];
        let inBoth = [];
        while (i < a.length && j < b.length) {
            if (a[i] < b[j]) {
                onlyInA.push(a[i]);
                i++;
            } else if (a[i] > b[j]) {
                onlyInB.push(b[j]);
                j++;
            } else {
                inBoth.push(a[i]);
                i++;
                j++;
            }
        }
        onlyInA.push(...a.slice(i));
        onlyInB.push(...b.slice(j));
        return [onlyInA, onlyInB, inBoth];
    }
    
    // Demo run
    let a = [2,3,5,7,11,13,17,19,23,29,31,37];
    let b = [1,3,7,15,31];
    let [onlyInA, onlyInB, inBoth] = difference(a, b);
    console.log("only in A:", ...onlyInA);
    console.log("only in B:", ...onlyInB);
    console.log("in both:", ...inBoth);