Search code examples
ruby

Does Ruby uniq preserve ordering?


The documentation doesn't say anything about that (http://www.ruby-doc.org/core-2.2.0/Array.html#method-i-uniq).

Also, is it using a naive O(n^2) search or something else like a hashmap ? In the latter case, should I understand that my elements must have a proper implementation of hash and eql? when I want to unicize them ?


Solution

  • Given the code (in C) for the Array#uniq

    rb_ary_uniq(VALUE ary)
    {
        VALUE hash, uniq, v;
        long i;
    
        if (RARRAY_LEN(ary) <= 1)
            return rb_ary_dup(ary);
        if (rb_block_given_p()) {
            hash = ary_make_hash_by(ary);
            uniq = ary_new(rb_obj_class(ary), RHASH_SIZE(hash));
            st_foreach(RHASH_TBL(hash), push_value, uniq);
        }
        else {
            hash = ary_make_hash(ary);
            uniq = ary_new(rb_obj_class(ary), RHASH_SIZE(hash));
            for (i=0; i<RARRAY_LEN(ary); i++) {
                st_data_t vv = (st_data_t)(v = rb_ary_elt(ary, i));
                if (st_delete(RHASH_TBL(hash), &vv, 0)) {
                    rb_ary_push(uniq, v);
                }
            }
        }
        ary_recycle_hash(hash);
    
        return uniq;
    }
    

    In the general case (the else block), it creates a hash from the array (that unifies the key without keeping the order). Then it create a new empty array with the right size. Finally it go through the first Array and when it finds the key in the hash it delete that key and push it to the empty array.

    Hence the order is kept.

    I'd say the complexity is O(complexity(ary_make_hash) + N) in time, which is probably O(N)


    should I understand that my elements must have a proper implementation of hash and eql? when I want to unicize them ?

    Yes exactly, they should implement both hash and eql? methods, which almost all Objects do.

    We can see that if we were to call uniq on an array of BasicObjects, which do not define the hash method, we would get an error:

    irb(main):023:0> [BasicObject.new, BasicObject.new].uniq
    Traceback (most recent call last):
            5: from /usr/bin/irb:23:in `<main>'
            #...
    NoMethodError (undefined method `hash' for #<BasicObject:0x000000014311a1e0>)