Search code examples
listperluniquexs

Data structure for unique element storage


I'm looking for a data structure which should preferably perform equal O(1)? for any number of elements when adding/removing/retrieving elements.

Here are some additional guidelines,

  • retrieving elements should not involve slow keys()
  • elements must be always unique and defined
  • element order is not significant
  • addition or removal of element should not involve iteration over other elements
  • gaps in retrieved list of elements are tolerable and can be represented with undef value

Please suggest better solution than,

sub uniqArrayFactory {
  my $members = [];
  my $seen = {};
  my $gaps = [];

  return sub {
    my (%arg) = @_;

    return $members if $arg{members};
    my $m;
    if (defined ($m = $arg{del})) {

      return if !$seen->{$m};
      ${ $seen->{$m} } = undef;
      push @$gaps, delete($seen->{$m});
    }
    elsif (defined ($m = $arg{add})) {

      return if $seen->{$m};
      if (@$gaps) {
        $seen->{$m} = pop @$gaps;
        ${ $seen->{$m} } = $m;
      }
      else {
        push @$members, $m;
        $seen->{$m} = \( $members->[-1] );
      }
    }
    return $m;
  };
}

UPDATE (usage)

my $fa = uniqArrayFactory();

$fa->(add => 10);
$fa->(del => 10);
my $members = $fa->(mebers => 1);

Solution

  • keys and each are surprisingly slow indeed. But if you store each element as a value of a hash and use values, things get a low faster. With

    use strict;
    use warnings;
    use Benchmark qw(:all);
    
    my $i;
    my $fa;
    my %hash;
    
    my %compare = (
      uarray => sub {
        $fa->(add => $i++);
        my $memb = $fa->(members => 1);
        for my $v (@$memb) { next if !defined $v; }
      },
      hash => sub {
        $hash{ $i } = $i;
        for my $v (values %hash) {}
        $i++;
      },
    );
    
    $i = 0; $fa = uniqArrayFactory(); %hash = ();
    cmpthese(10000, \%compare);
    
    sub uniqArrayFactory {
      my $members = [];
      my $seen = {};
      my $gaps = [];
    
      return sub {
        my (%arg) = @_;
    
        return $members if exists $arg{members};
        my $m;
        if (defined ($m = $arg{del})) {
    
          return if !$seen->{$m};
          ${ $seen->{$m} } = undef;
          push @$gaps, delete($seen->{$m});
        }
        elsif (defined ($m = $arg{add})) {
    
          return if $seen->{$m};
          if (@$gaps) {
            $seen->{$m} = pop @$gaps;
            ${ $seen->{$m} } = $m;
          }
          else {
            push @$members, $m;
            $seen->{$m} = \( $members->[-1] );
          }
        }
        return $m;
      };
    }
    

    I get:

             Rate   hash uarray
    hash   3205/s     --    -6%
    uarray 3401/s     6%     --