Search code examples
scalafileinputstream

Extract range of bytes from file in Scala


I have a binary file that I need to extract some range of bytes from: start: Long - end: Long. I need Long because there are several gigagbytes. My app needs to give the result back as a ByteString. I tried

val content: Array[Byte] = Array()
val stream: FileInputStream = new FileInputStream(file: File)
stream.skip(start)
stream.read(content, 0, end-start)

but already I cannot use Long in read, only Int (is this a bug? skip is ok with Long...). Also I would need to convert the result to ByteString. I would love to do this, too:

val stream: FileInputStream = new FileInputStream(file: File)
stream.skip(start)
org.apache.commons.io.IOUtils.toByteArray(stream)

but how do I tell it where to end? stream has no method takeWhile or take. Then I tried

val source = scala.io.Source.fromFile(file: File)
source.drop(start).take(end-start)

Again, only Int in drop...

How can I do that ?


Solution

  • Use IOUtils.toByteArray(InputStream input, long size)

    val stream = new FileInputStream(file)
    stream.skip(start)
    val bytesICareAbout = IOUtils.toByteArray(stream, end-start)
    // form the ByteString from bytesICareAbout
    

    Note this will throw if end - start is greater than Integer.MAX_VALUE, for a good reason! You wouldn't want a 2GB array to be allocated in-memory.

    If for some reason your end - start > Integer.MAX_VALUE, you should definitely avoid allocating a single ByteString to represent the data. Instead, you should do something like:

    import org.apache.commons.io.input.BoundedInputStream
    
    val stream = new FileInputStream(file)
    stream.skip(start)
    val boundedStream = new BoundedInputStream(stream, start - end)