Search code examples
erlang

Erlang - String into Set


I am currently trying to understand the behavior of Erlang Sets, when I compute Sets on String Anagrams. In my understanding, two Anagrams should produce two identical Sets of strings.

Set1 = sets:from_list("orchestra"). 
Set2 = sets:from_list("carthorse"). 
Set1 =:= Set2. %% true

However, using sets:intersection we receive a different set, which is not equal to the first two sets.

Intersection = sets:intersection(Set1, Set2). 
Intersection =:= Set1. %% false
Intersection =:= Set2. %% false

Is there a particular reason for this behavior, based on how Set-Intersections are computed in Erlang? Many thanks in advance!


Solution

  • The implementation of the sets module does not guarantee that two sets can be compared with =:= even if they contain the same elements. The internal data structure can differ. You could use operations like is_subset/2 or subtract/2 (relatively inefficient), or you could use to_list/1 and then lists:sort/1 to get two lists that could be compared directly. But if you're starting from strings (lists of characters) anyway, you would be better off using ordsets right away. These are ordered lists which you can manipulate as sets, and can be directly compared. For small-ish sets they are usually more efficient that sets anyway.

    > Set1 = ordsets:from_list("orchestra").
    "acehorst"
    > Set2 = ordsets:from_list("carthorse").
    "acehorst"
    > Set1 =:= Set2.
    true
    > Intersection = ordsets:intersection(Set1, Set2).
    "acehorst"
    > Intersection =:= Set1.
    true
    > Intersection =:= Set2.
    true