I am currently trying to understand the behavior of Erlang Sets, when I compute Sets on String Anagrams. In my understanding, two Anagrams should produce two identical Sets of strings.
Set1 = sets:from_list("orchestra").
Set2 = sets:from_list("carthorse").
Set1 =:= Set2. %% true
However, using sets:intersection
we receive a different set, which is not equal to the first two sets.
Intersection = sets:intersection(Set1, Set2).
Intersection =:= Set1. %% false
Intersection =:= Set2. %% false
Is there a particular reason for this behavior, based on how Set-Intersections are computed in Erlang? Many thanks in advance!
The implementation of the sets
module does not guarantee that two sets can be compared with =:=
even if they contain the same elements. The internal data structure can differ. You could use operations like is_subset/2
or subtract/2
(relatively inefficient), or you could use to_list/1
and then lists:sort/1
to get two lists that could be compared directly. But if you're starting from strings (lists of characters) anyway, you would be better off using ordsets
right away. These are ordered lists which you can manipulate as sets, and can be directly compared. For small-ish sets they are usually more efficient that sets
anyway.
> Set1 = ordsets:from_list("orchestra").
"acehorst"
> Set2 = ordsets:from_list("carthorse").
"acehorst"
> Set1 =:= Set2.
true
> Intersection = ordsets:intersection(Set1, Set2).
"acehorst"
> Intersection =:= Set1.
true
> Intersection =:= Set2.
true