Search code examples
javascalaout-of-memoryapache-commonsfileutils

OutOfMemory error when using Apache Commons lineIterator


I'm trying to iterate line-by-line a 1.2GB file using Apache Commons FileUtils.lineIterator. However, as soon as a LineIterator calls hasNext() I get a java.lang.OutOfMemoryError: Java heap space. I've already allocated 1G to the java heap.

What am I doing wrong in here? After reading some docs, isn't LineIterator supposed to be reading the file from the file system and not loading it into memory?

Note the code is in Scala:

  val file = new java.io.File("data_export.dat")
  val it = org.apache.commons.io.FileUtils.lineIterator(file, "UTF-8")
  var successCount = 0L
  var totalCount = 0L
  try {
    while ( {
      it.hasNext()
    }) {
      try {
        val legacy = parse[LegacyEvent](it.nextLine())
        BehaviorEvent(legacy)
        successCount += 1L
      } catch {
        case e: Exception => println("Parse error")
      }
      totalCount += 1
    }
  } finally {
    it.close()
  }

Thanks for your help here!


Solution

  • The code looks good. Probably it does not find an end of a line in the file and reads a very long line which is larger than 1Gb into memory.

    Try wc -l in Unix and see how many lines you get.