Basically I have a very large array of objects and I need to remove 10% of the least fit objects.
Each object has a fitness variable (a double) associated with it. I dont have a number which determines whether or not an object is fit, I just want the least fit of the bunch.
What is the best way of retrieving (sampling) the least fit objects?
One way could be randomly select lets way 20%, sort the data, then remove 10%. But I think this is not a very smart way of doing it.
The other way it to keep the array sorted at all times then remove the first 10%. But I dont think this is very good because you will have to always keep sorting the array whilst inserting/updating which is a big overhead..
Let k be yourCollection.length() * 0.1
and n = yourCollection.length()
.
Find the k-th smallest element (QuickSelect or Median of 5), where the key is your fitness. Let's call it p
. This can be done in O(n)
.
Then traverse through the collection and remove all the items with fitness less than p.fitness. We've got an O(n)
solution.
Or, you can create a heap in O(n)
with key=fitness
and remove k
elements from it in O(k * log(n))
.