diff = list(set(map(tuple, paths1)) - set(map(tuple, paths2)))
Where paths1
and paths2
are list of list of pairs.
Example:
paths1 = [[(1,2),(2,3),(3,4)],[(1,3),(3,5)]]
paths2 = [[(5,2),(2,3),(3,4)],[(1,3),(3,5)]]
print(list(set(map(tuple, paths1)) - set(map(tuple, paths2))))
Should output [((1, 2), (2, 3), (3, 4))]
. The inner list has to be converted to tuple first because this type of list couldn't be hashed into set.
In the provided C++ code below, I tried using the set_difference function from the standard library:
#include <iostream>
#include <vector>
#include <utility>
#include <algorithm>
int main () {
std::vector < std::pair < int, int >>v1 =
{ std::make_pair (1, 2), std::make_pair (2, 3), std::make_pair (3, 4) };
std::vector < std::pair < int, int >>v2 =
{ std::make_pair (5, 2), std::make_pair (2, 3), std::make_pair (3, 4) };
std::vector < std::pair < int, int >>v3 =
{ std::make_pair (1, 3), std::make_pair (3, 5) };
std::vector < std::vector < std::pair < int, int >>>V1 = { v1, v3 };
std::vector < std::vector < std::pair < int, int >>>V2 = { v2, v3 };
std::vector < std::vector < std::pair < int, int >>>diff;
std::set_difference (V1.begin (), V1.end (), V2.begin (), V2.end (),
std::inserter (diff, diff.begin ()));
std::cout << "[";
for (auto v : diff) {
std::cout << "[";
for (auto p : v)
std::cout << "(" << p.first << "," << p.second << ")";
std::cout << "]";
}
std::cout << "]\n";
}
This code printed [[(1,2)(2,3)(3,4)][(1,3)(3,5)]]
. Why is the 2nd inner list not removed when it should be?
Python sets are based on hashing. So, a Python set difference works by iterating the left set, looking each element up in the right set's hashmap, and skipping the ones that match.
C++ sets are based on sorting (actually, binary search trees), not hashing. The same algorithm would work, but it would take (log-linear instead of linear time. So they use a different algorithm that does work in linear time: assuming both ranges are sorted, you can just walk the two in parallel.
Accordingly, C++ set_difference
only works on sorted ranges:
Copies the elements from the sorted range
[first1, last1)
which are not found in the sorted range[first2, last2)
to the range beginning atd_first
.
When you give it non-sorted ranges, it doesn't know you've done that, and tries to walk them in parallel, and gets confused. After the left list passes the (5, 2)
, it's already past all the other elements, so nothing else gets skipped.