Search code examples
kotlinfunctional-programming

How to split a list into sublists using a predicate with Kotlin?


I am trying an idiomatic, and ideally functional way to split a list into sublists in Kotlin .

Imagine the input being ["aaa", "bbb", "", "ccc", "", "ddd", "eee", "fff"], I want to return [["aaa", "bbb"], ["ccc"], ["ddd", "eee", "fff"]] for the given predicate string.isEmpty().

It is quite simple to do with a for loop and an accumulator; but I haven't found a way to write it functionally that I find readable enough.

So far my best outcome is :

lines.foldIndexed(Pair(listOf<List<String>>(), listOf<String>()), { idx, acc, line ->
    when {
    idx + 1 == lines.size -> {
        Pair(acc.first + listOf(acc.second + line), listOf())
    }
    line.isEmpty() -> {
        Pair(acc.first + listOf(acc.second), listOf())
    }
    else -> {
        Pair(acc.first, acc.second + line)
        }
    }
}).first

Essentially, I am using a fold with a double accumulator that keeps track of the current list and resets when the predicate is found. The list feeds into the complete result at that point. I am using a foldIndexed in order to get my last list in.

Do you folks know of any better way ?

For reference, a loop version could be

val data = mutableListOf<String>()
var currentData = ""
for(line in lines){
    if(line.isEmpty()) {
        data.add(currentData)
        currentData = ""
    }
    else{
        currentData = "$currentData $line"
    }
}
data.add(currentData)

Thanks !


Solution

  • I'd suggest to find splitting points (manually adding edge indices) first and then do slices:

    val lines = listOf("aaa", "bbb", "", "ccc", "", "ddd", "eee", "fff")
    val result = lines
        .flatMapIndexed { index, x ->
            when {
                index == 0 || index == lines.lastIndex -> listOf(index)
                x.isEmpty() -> listOf(index - 1, index + 1)
                else -> emptyList()
            }
        }
        .windowed(size = 2, step = 2) { (from, to) -> lines.slice(from..to) }
    println(result) //[[aaa, bbb], [ccc], [ddd, eee, fff]]