Search code examples
pythonc++code-translation

Converting this python line to C++?


diff = list(set(map(tuple, paths1)) - set(map(tuple, paths2)))

Where paths1 and paths2 are list of list of pairs.

Example:

paths1 = [[(1,2),(2,3),(3,4)],[(1,3),(3,5)]]
paths2 = [[(5,2),(2,3),(3,4)],[(1,3),(3,5)]]
print(list(set(map(tuple, paths1)) - set(map(tuple, paths2))))

Should output [((1, 2), (2, 3), (3, 4))]. The inner list has to be converted to tuple first because this type of list couldn't be hashed into set.

In the provided C++ code below, I tried using the set_difference function from the standard library:

#include <iostream>
#include <vector>
#include <utility>
#include <algorithm>

int main () {

    std::vector < std::pair < int, int >>v1 =
        { std::make_pair (1, 2), std::make_pair (2, 3), std::make_pair (3, 4) };
    std::vector < std::pair < int, int >>v2 =
        { std::make_pair (5, 2), std::make_pair (2, 3), std::make_pair (3, 4) };
    std::vector < std::pair < int, int >>v3 =
        { std::make_pair (1, 3), std::make_pair (3, 5) };

    std::vector < std::vector < std::pair < int, int >>>V1 = { v1, v3 };
    std::vector < std::vector < std::pair < int, int >>>V2 = { v2, v3 };
    std::vector < std::vector < std::pair < int, int >>>diff;

    std::set_difference (V1.begin (), V1.end (), V2.begin (), V2.end (),
                std::inserter (diff, diff.begin ()));

    std::cout << "[";
    for (auto v : diff) {
        std::cout << "[";
        for (auto p : v)
            std::cout << "(" << p.first << "," << p.second << ")";
        std::cout << "]";
    }
    std::cout << "]\n";

} 

This code printed [[(1,2)(2,3)(3,4)][(1,3)(3,5)]]. Why is the 2nd inner list not removed when it should be?


Solution

  • Python sets are based on hashing. So, a Python set difference works by iterating the left set, looking each element up in the right set's hashmap, and skipping the ones that match.

    C++ sets are based on sorting (actually, binary search trees), not hashing. The same algorithm would work, but it would take (log-linear instead of linear time. So they use a different algorithm that does work in linear time: assuming both ranges are sorted, you can just walk the two in parallel.

    Accordingly, C++ set_difference only works on sorted ranges:

    Copies the elements from the sorted range [first1, last1) which are not found in the sorted range [first2, last2) to the range beginning at d_first.

    When you give it non-sorted ranges, it doesn't know you've done that, and tries to walk them in parallel, and gets confused. After the left list passes the (5, 2), it's already past all the other elements, so nothing else gets skipped.