Search code examples
javakotlinsetguavahashset

Java/Kotlin: Finding the intersection of multiple HashSets by class ID


I'm having trouble finding the intersection of an Array of Hashed Sets that contain a data Class (where I want to intersect by identifier):

class Protein(val id: String, val score: Double, val molw: Double, val spc: Int)

I've pulled in some data from a .csv file into this type of structure:

ArrayList<HashSet<Protein>>

So I have six array lists [1 for each csv], each containing one hashed set that contains thousands of Protein structures. Here's what I've tried so far to get an intersection HashSet based off of common Protein.id:

fun intersection(data: ArrayList<HashSet<Protein>>): HashSet<Protein> {

val intersectionSet = HashSet<Protein>(data[0])

for (i in 1..data.size) {
    intersectionSet.retainAll(data[i])
}
return intersectionSet
}

This returns an empty list, which makes sense given that it's trying to intersect Protein objects and match each criteria as a whole.

How do I call data[i].id as my intersection criteria? I'm fairly new to Kotlin and data classes :)


Solution

  • If you add definitions for the hashCode and equals function in the Protein class as follows, then the HashSet will be able to appropriately check the intersection using the id field.

    class Protein(val id: String, val score: Double, val molw: Double, val spc: Int) {
      override fun hashCode() = id.hashCode()
      override fun equals(other: Any?) = other?.let { id == (it as Protein).id } ?: false
    }
    

    Also you probably want to change the range in your loop within the intersection function to be 1..(data.size-1) instead of 1..data.size to avoid going out of bounds. Alternatively you could write it functionally as follows:

    fun intersection(data: ArrayList<HashSet<Protein>>): HashSet<Protein> {
      return data.reduce { acc, it -> acc.apply { retainAll(it) } }
    }