I have an array of elements in Ruby
nums = [1,1,1,2,2,3]
The task is to remove some duplicates in-place if element occures in array more than twice.
I wrote following code:
nums = [1,1,1,2,2,3]
def remove_elements(nums)
nums.delete_if{ |num| nums.count(num) > 2 } #reject! returns the same result
end
So i expected this result
[2,2,3]
But got
[2,3]
However reject
method returns expected array
def remove_elements(nums)
nums.reject{ |num| nums.count(num) > 2 } # returns [2,2,3]
end
This is due to the way delete_if
/ reject!
(and their counterparts keep_if
/ select!
) work internally. Let's take a look at another example which removes even numbers while printing nums
at each step:
nums = [0, 1, 2, 3, 4, 5, 6]
def remove_elements(nums)
nums.reject! do |num|
p nums: nums
num.even?
end
end
p result: remove_elements(nums)
Output:
{:nums=>[0, 1, 2, 3, 4, 5, 6]}
{:nums=>[0, 1, 2, 3, 4, 5, 6]}
{:nums=>[1, 1, 2, 3, 4, 5, 6]}
{:nums=>[1, 1, 2, 3, 4, 5, 6]}
{:nums=>[1, 3, 2, 3, 4, 5, 6]}
{:nums=>[1, 3, 2, 3, 4, 5, 6]}
{:nums=>[1, 3, 5, 3, 4, 5, 6]}
{:result=>[1, 3, 5]}
As you can see, nums
is being modified in a seemingly strange way. But what happens here exactly?
Instead of creating a temporary array, Ruby's implementation re-uses the existing array to store the result. As it is traversing the array, it copies each element that should be kept (for which the block returns a falsy result) to the front, overwriting any existing element. In the end, the array gets truncated to its final size: (ASCII art ahead)
[0, 1, 2, 3, 4, 5, 6]
┌──┘
[1, 1, 2, 3, 4, 5, 6]
┌─────┘
[1, 3, 2, 3, 4, 5, 6]
┌────────┘
[1, 3, 5, 3, 4, 5, 6]
[1, 3, 5]
───┬───
result
Note that 1
, 3
and 5
occur twice in the array intermediately. That's why your nums.count
doesn't work as expected – you're counting elements while the array is being restructured.
To solve this problem, you could count the elements in advance, e.g. via tally
:
nums = [1, 1, 1, 2, 2, 3]
counts = nums.tally
#=> {1=>3, 2=>2, 3=>1}
nums.delete_if { |num| counts[num] > 2 }
#=> [2, 2, 3]
This is also faster because you don't have to traverse the entire array for each element again in order to count it.