I'm currently working on a Kotlin project to automate the upkeep of emergency location data. As part of that project I need to compare two collections of Json objects to determine what data needs to be added/updated/deleted. Collection A is the collection of existing data. Collection B is the incoming data from the job. While the objects in the collections are not identical, they have a unique id that can be used to link them.
For each item that exists in A but not in B I need to make a delete call. For each item that is in B but not in A I need to do a create. For each item that exists in A and B I need to do an update.
I need to figure out a way to determine the needed operations, each of which will involve an HTTP request to a 3rd party API, and execute the needed operations as efficiently as possible. I know I could simply brute force the problem by iterating over all the items in each collection. However since this is going to be part of an AWS Lambda, I don't think that is going to cut it.
What is the most efficient approach to a problem like this using Kotlin?
Update: I ended up going with the solution from @broot however it required an extra step. Since the objects in each list are from two different classes, I needed to create an interface to define their common properties. When I was done my classes looked like
interface HasUid{
val uid:String
}
data class OldData(val id:String, val name:String, val mac:String):HasUid {
override val uid: String
get() = name
}
data class NewData(val identifier:String, val display_name:String, val mac:String):HasUid {
override val uid: String
get() = display_name
}
and I called the function with
val(toDelete,toAdd,toUpdate) = diffSetsBy(oldDataList, newDataList, HasUid::uid)
println("to update: " + toUpdate.size)
println("to add: " + toAdd.size)
println("to delete: " + toDelete.size)
If I understand the question correctly, it is simply about comparing two sets of objects that are stored already in the memory, by some kind of a key. There isn't really too much rocket science or performance optimizations we can involve here. The only optimization is to not compare each item with each item (that would be O(N*M)
), but instead, create a hash set/map and search in it (O(N+M)
):
fun <T : Any> diffSetsBy(setA: Iterable<T>, setB: Iterable<T>, keySelector: (T) -> Any?): SetsDiff<T> {
val onlyA = setA.associateByTo(mutableMapOf(), keySelector)
val onlyB = mutableListOf<T>()
val both = mutableListOf<T>()
for (b in setB) {
if (onlyA.remove(keySelector(b)) != null) {
both += b
} else {
onlyB += b
}
}
return SetsDiff(onlyA.values, onlyB, both)
}
data class SetsDiff<T>(val onlyA: Iterable<T>, val onlyB: Iterable<T>, val both: Iterable<T>)
Algorithm requires that keys are not duplicated in a single set. I used a traditional imperative code as I don't see a way to make the code more functional or idiomatic and keep the readability. Also, as I created a generic diffSetsBy
function, it would probably make sense to return pairs of objects for both
, but we don't need it for our case and I was too lazy.
We can use it like this:
fun main() {
val users = listOf(User("1", "James"), User("2", "Jonh"))
val newUsers = listOf(User("2", "John"), User("3", "Dave"))
val (toDelete, toAdd, toUpdate) = diffSetsBy(users, newUsers, User::login)
println(toDelete) // [User(login=1, name=James)]
println(toAdd) // [User(login=3, name=Dave)]
println(toUpdate) // [User(login=2, name=John)]
}
data class User(val login: String, val name: String)