Search code examples
rubyarraysno-duplicates

Removing Identical Objects in Ruby?


I am writing a Ruby app at the moment which is going to search twitter for various things. One of the problems I am going to face is shared results between searches in close proximity to each other time-wise. The results are returned in an array of objects each of which is a single tweet. I know of the Array.uniq method in ruby which returns an array with all the duplicates removed.

My question is this. Does the uniq method remove duplicates in so far as these objects point to the same space in memory or that they contain identical information?

If the former, whats the best way of removing duplicates from an array based on their content?


Solution

  • Does the uniq method remove duplicates in so far as these objects point to the same space in memory or that they contain identical information?

    The method relies on the eql? method so it removes all the elements where a.eql?(b) returns true. The exact behavior depends on the specific object you are dealing with.

    Strings, for example, are considered equal if they contain the same text regardless they share the same memory allocation.

    a = b = "foo"
    c = "foo"
    
    [a, b, c].uniq
    # => ["foo"]
    

    This is true for the most part of core objects but not for ruby objects.

    class Foo
    end
    
    a = Foo.new
    b = Foo.new
    
    a.eql? b
    # => false
    

    Ruby encourages you to redefine the == operator depending on your class context.

    In your specific case I would suggest to create an object representing a twitter result and implement your comparison logic so that Array.uniq will behave as you expect.

    class Result
    
      attr_accessor :text, :notes
    
      def initialize(text = nil, notes = nil)
        self.text = text
        self.notes = notes
      end
    
      def ==(other)
        other.class == self.class &&
        other.text  == self.text
      end
      alias :eql? :==
    
    end
    
    a = Result.new("first")
    b = Result.new("first")
    c = Result.new("third")
    
    [a, b, c].uniq
    # => [a, c]