Search code examples
kotlinexceptioncsvmapper

Using CsvMapper in Kotlin to parse huge CSV file


I have a CSV file generated as a report by the server. The CSV file has more than 100k lines of text. I'm reading the file in Kotlin using CsvMapper, but I'm ending up with IOException.

Here's the code that I'm implemented:

//Declare the mapper object
private var csvMapper = CsvMapper().registerModule(KotlinModule())
        
//Generate Iterator
inline fun <reified T> getIterator(fileName: String): MappingIterator<T>? {
    csvMapper.disable(JsonParser.Feature.AUTO_CLOSE_SOURCE)
    FileReader(fileName).use { reader ->
        return csvMapper
                       .readerFor(T::class.java)
                       .without(StreamReadFeature.AUTO_CLOSE_SOURCE)
                       .with(CsvSchema.emptySchema().withHeader())
                       .readValues<T>(reader)
    }
}
    
//Read the file using iterator
fun read(csvFile: String) {
    val iterator = getIterator<BbMembershipData>(csvFile)
    if (iterator != null) {
          while (iterator.hasNext()) {
                try {
                     val lineElement = iterator.next()
                     println(lineElement)
                } catch (e: RuntimeJsonMappingException) {
                    println("Iterator Exception: " + e.localizedMessage)
                }
          }
     }
}

After printing 10 lines of code, its throwing below exception:

Exception in thread "main" java.lang.RuntimeException: Stream closed
    at com.fasterxml.jackson.databind.MappingIterator._handleIOException(MappingIterator.java:420)
    at com.fasterxml.jackson.databind.MappingIterator.next(MappingIterator.java:203)
    at Main$FileProcessor.read(Main.kt:39)
    at Main.main(Main.kt:54)
Caused by: java.io.IOException: Stream closed

How can I prevent the "Stream Closed" exception?


Solution

  • The way CsvMapper works is that it reads lazily, rather than reading the whole file the moment you call readValues.

    When the end of the use block is reached, basically nothing has actually been read yet! You only start reading the file when you start using the iterator, but by then, the file is already closed!

    Therefore, read needs to open and close the file, not getIterator:

    //Generate Iterator
    inline fun <reified T> getIterator(reader: Reader): MappingIterator<T>? {
        csvMapper.disable(JsonParser.Feature.AUTO_CLOSE_SOURCE)
        return csvMapper
                       .readerFor(T::class.java)
                       .without(StreamReadFeature.AUTO_CLOSE_SOURCE)
                       .with(CsvSchema.emptySchema().withHeader())
                       .readValues<T>(reader)
        
    //Read the file using iterator
    fun read(csvFile: String) {
        FileReader(csvFile).use { reader ->
            val iterator = getIterator<BbMembershipData>(reader)
            if (iterator != null) {
                while (iterator.hasNext()) {
                    try {
                        val lineElement = iterator.next()
                        println(lineElement)
                    } catch (e: RuntimeJsonMappingException) {
                        println("Iterator Exception: " + e.localizedMessage)
                    }
                }
            }
        }
    }
    

    Notice how the end of the use block changed, to where you finish reading the whole file.