Search code examples
kotlinfiltercoroutine

How to use async filter with big data?


How to filter a list with async?

I have a big list of names. How to make filter and getting uri with async methods?

suspend fun getContactData(): List<ContactData> = coroutineScope {
    val names = async(Dispatchers.IO) { getNames() }.await()
    val emails = async(Dispatchers.IO) { getEmails() }.await()

    names.filter { contact -> emails[contact.id] != null }.map { contact ->
        val contactEmails = emails[contact.id].orEmpty()
        ContactData(
            contact = Contact(contact.id, contact.name, contactEmails, contactEmails),
            uri = getContactUri(contactEmails)
        )
    }
}

Solution

  • What is big? When it is really big the data should be stored in a database with indexes, filtering can then be done by SQL.

    However, if the list that you have in memory is still too big to be filtered on the main thread, you can just move it to the Default dispatcher. Nothing more to do. Your coroutine will suspend until the filtering is done, but it won't block the main thread. If you don't want your code to suspend, just launch a new coroutine and call getContactData from there. You can use either a callback to return the data when the filtering is done or launch the coroutine with async and wait for the result at some later point.

    If you were looking to distribute the filtering over multiple threads to speed it up, Kotlin currently doesn't have support for that. You could use Java streams for that (only if you target the JVM, of course), but my experience is that this rarely yields any measurable improvements. You should only consider this if your filtering takes up time in the range of seconds, for which you probably need to have several million entries.

    That said, independent of coroutines and threads, your filtering can be sped up quite a bit. You are iterating twice over the names which can be fused into a single mapNotNull, doing the mapping and the filtering in just one iteration:

    suspend fun getContactData(): List<ContactData> {
        val names = getNames()
        val emails = getEmails()
        
        return withContext(Dispatchers.Default) {
            names.mapNotNull { contact ->
                emails[contact.id]?.let { email ->
                    ContactData(
                        contact = Contact(contact.id, contact.name, email, email),
                        uri = getContactUri(email),
                    )
                }
            }
        }
    }
    

    Note that I removed the async/await call you used for getNames and getEmails. What that did was to launch a new coroutine, just to be waited on for it to finish. That's not what coroutines are for, you can simply use the current coroutine for that and directly call getNames and getEmails. If this was intended to move the execution to the IO dispatcher, you should actually do that with withContext from inside those functions, not when you call them.