Search code examples
arraysscaladuplicatesdistinct

Scala / How to remove duplicates of an array of tuples based on two values?


I have an array with millions of tuple elements like:

var arr: ArrayBuffer[(String, String)] = ArrayBuffer[(String, String)]()
arr += (("Hamburg", "Street1"))
arr += (("Hamburg", "Street2"))
arr += (("Hamburg", "Street1")) // duplicate - remove
arr += (("Berlin",  "StreetA"))
arr += (("Berlin",  "StreetZ"))
arr += (("Berlin",  "StreetZ")) // duplicate - remove
arr += (("Berlin",  "StreetA")) // duplicate - remove

I would now like to have those duplicates within that array removed, where City AND Street are equal. Something like:

arr.distinctBy(_._1&_._2) // doesn't work just for illustration

Is there a simple solution to it, how this can be done to get an output like:

(("Hamburg", "Street1"))
(("Hamburg", "Street2"))
(("Berlin",  "StreetA"))
(("Berlin",  "StreetZ"))

Solution

  • Since equals and hashCode are overridden for tuples you can use distinct which is effectively is distinctBy(identity):

    val result = arr.distinct