Search code examples
ruby-on-railsruby

unexpected behavior of delete_if and reject! methods


I have an array of elements in Ruby

nums = [1,1,1,2,2,3]

The task is to remove some duplicates in-place if element occures in array more than twice.

I wrote following code:

nums = [1,1,1,2,2,3]

def remove_elements(nums)
  nums.delete_if{ |num| nums.count(num) > 2 } #reject! returns the same result
end

So i expected this result

[2,2,3]

But got

[2,3]

However reject method returns expected array

def remove_elements(nums)
  nums.reject{ |num| nums.count(num) > 2 } # returns [2,2,3]
end

Solution

  • This is due to the way delete_if / reject! (and their counterparts keep_if / select!) work internally. Let's take a look at another example which removes even numbers while printing nums at each step:

    nums = [0, 1, 2, 3, 4, 5, 6]
    
    def remove_elements(nums)
      nums.reject! do |num|
        p nums: nums
        num.even?
      end
    end
    
    p result: remove_elements(nums)
    

    Output:

    {:nums=>[0, 1, 2, 3, 4, 5, 6]}
    {:nums=>[0, 1, 2, 3, 4, 5, 6]}
    {:nums=>[1, 1, 2, 3, 4, 5, 6]}
    {:nums=>[1, 1, 2, 3, 4, 5, 6]}
    {:nums=>[1, 3, 2, 3, 4, 5, 6]}
    {:nums=>[1, 3, 2, 3, 4, 5, 6]}
    {:nums=>[1, 3, 5, 3, 4, 5, 6]}
    {:result=>[1, 3, 5]}
    

    As you can see, nums is being modified in a seemingly strange way. But what happens here exactly?

    Instead of creating a temporary array, Ruby's implementation re-uses the existing array to store the result. As it is traversing the array, it copies each element that should be kept (for which the block returns a falsy result) to the front, overwriting any existing element. In the end, the array gets truncated to its final size: (ASCII art ahead)

    [0, 1, 2, 3, 4, 5, 6]
     ┌──┘
    [1, 1, 2, 3, 4, 5, 6]
        ┌─────┘
    [1, 3, 2, 3, 4, 5, 6]
           ┌────────┘
    [1, 3, 5, 3, 4, 5, 6]
    
    [1, 3, 5]
     ───┬───
      result
    

    Note that 1, 3 and 5 occur twice in the array intermediately. That's why your nums.count doesn't work as expected – you're counting elements while the array is being restructured.


    To solve this problem, you could count the elements in advance, e.g. via tally:

    nums = [1, 1, 1, 2, 2, 3]
    
    counts = nums.tally
    #=> {1=>3, 2=>2, 3=>1}
    
    nums.delete_if { |num| counts[num] > 2 }
    #=> [2, 2, 3]
    

    This is also faster because you don't have to traverse the entire array for each element again in order to count it.