Search code examples
scalalarge-filesbytearrayoutputstream

scala read large files


Hello I am looking for fastest bat rather hi-level way to work with large data collection. My task consist of two task read alot of large files in memory and then make some statistical calculations (the easiest way to work with data in this task is random access array ).

My first approach was to use java.io.ByteArrayOutputStream, becuase it can resize it's internal storage .

def packTo(buf:java.io.ByteArrayOutputStream,f:File) = {
  try {
    val fs = new java.io.FileInputStream(f)
    IOUtils.copy(fs,buf)
  } catch  {
    case e:java.io.FileNotFoundException =>
  }
}

    val buf = new java.io.ByteArrayOutputStream()
    files foreach { f:File  => packTo(buf,f) } 
    println(buf.size())

    for(i <- 0 to buf.size()) {
       for(j <- 0 to buf.size()) {
          for(k <- 0 to buf.size()) {
       //  println("i  " + i + "  " + buf[i] );
                   // Calculate something amathing using buf[i] buf[j] buf[k] 
          }
       }
    }

    println("amazing = " + ???)

but ByteArrayOutputStream can't get me as byte[] only copy of it. But I can not allow to have 2 copies of data .


Solution

  • Have you tried scala-io? Should be as simple as Resource.fromFile(f).byteArray with it.