Search code examples
pythonlistscalasplitoverlapping

split the large sequence of logs with start time being overlapped


I would to split a large number of logs into smaller sequences but with start time of logs being overlapped.

For example lets say we have

largeLogs = {
[startTime=A, duration=22],
[startTime=B, duration=12],
[startTime=C, duration=34],
[startTime=D, duration=12],
[startTime=E, duration=18],
[startTime=F, duration=8]
}

The request output should be :

{[[startTime=A, duration=22],
[startTime=B, duration=12],
[startTime=C, duration=34]],

[[startTime=B, duration=12],
[startTime=C, duration=18],
[startTime=D, duration=8]],

[[startTime=c, duration=12],
[startTime=D, duration=18],
[startTime=E, duration=8]]}

I had written in in python as the following

def split_func(batchSize, logs):
    batchSize = min(batchSize, len(logs)-1)
    return [logs[i:i+b4] for i in range(len(logs) - batchSize+1)]

As I am new to scala I tried to write in as following but i am getting and stuck in the last line

def split_func(batchSize:Int, partialLogs: ListBuffer[Array[Byte]] ) : ListBuffer[Array[Byte]] = {

    batchSize = Math.min(batchSize, partialLogs.size - 1) // getting error reassignment to val

    val i = 0 to partialLogs.size - batchSize+1

    return [lst[i:i+n] // no idea how to change this line from python to scala

Solution

  • There is a Scala method called sliding that will do what you want:

    partialLogs.sliding(batchSize, batchSize-overlapSize)
    

    The first parameter is the size of each block and the second is the gap between the start of each block.