Search code examples
androidjsonkotlinklaxon

Parse very large JSON files with dynamic data


I need to parse very large JSON files that are downloaded from a server. These JSON files can contain completely different keys and values. Here are some examples...

{ "result": "PASS", 
  "items": [ 
    { "name": "John", "age": 33 }, 
    { "name": "Jane", "age": 23 } ] 
}


{ "result": "PASS", 
  "items": [ 
    { "make": "Ford", "model": "Mustang", "colors": ["blue", "red", "silver"] }, 
    { "make": "Dodge", "model": "Charger", "colors": ["yellow", "black", "silver"] } ] 
}

The items array can potentially contain thousands of entries and the data within each item can contain up to 60 key/value pairs.

These are just two examples but I need to be able to parse 30-40 different types of JSON files and I can't always control what type of data is in the file. Because of this, I cannot create custom models to bind the data to objects in my app.

What I'm trying to do is create a JsonObject for each item in the items array and add it to a MutableList that I can use in the app. I am currently using the Klaxon Streaming API to try and accomplish this but can seem to find a way to do it without binding to a custom object.

JsonReader(StringReader(testJson)).use { reader ->
        reader.beginObject {
            var result: String? = null
            while (reader.hasNext()) {
                val name = reader.nextName()
                when (name) {
                    "result" -> result = reader.nextString()
                    "items" -> {
                        reader.beginArray {
                            while (reader.hasNext()) {
                                // ???
                            }
                        }
                    }
                }
            }
        }
    }

Solution

  • If you are going to collect all items to a list anyways (instead of processing them immediately one by another), using streaming API makes not much sense. It can be done much simpler:

    val response = Klaxon().parseJsonObject(StringReader(testJson))
    val result = response["result"]
    val items = response.array<JsonObject>("items") ?: JsonArray()
    ...
    

    Streaming processing is a bit more involved. First of all you would like to make sure, that the server response is not read entirely into the memory before starting processing (i.e. the parser input should not be a string, but rather an input stream. Details depend on the http client library of your choice). Secondly, you would need to provide some kind of callback, to process the items as they arrive, e.g.:

    fun parse(input: Reader, onResult: (String) -> Unit, onItem: (JsonObject) -> Unit)  {
    
        JsonReader(input).use { reader ->
            reader.beginObject {
                while (reader.hasNext()) {
                    when (reader.nextName()) {
                        "result" -> onResult(reader.nextString())
                        "items" -> reader.beginArray {
                            while (reader.hasNext()) {
                                val item = Parser(passedLexer = reader.lexer, streaming = true).parse(reader) as JsonObject
                                onItem(item)
                            }
                        }
                    }
                }
            }
        }
    }
    
    fun main(args: Array<String>) {
    
        // "input" simulates the server response 
        val input = ByteArrayInputStream(testJson.encodeToByteArray())
    
        InputStreamReader(input).use {
            parse(it,
                onResult = { println("""Result: $it""") },
                onItem = { println(it.asIterable().joinToString(", ")) }
            )
        }
    }
    

    Yet better would be integrating Klaxon with the Kotlin Flow or Sequence, but I found it difficult due to the beginObject and beginArray wrappers, which do not play well with suspend functions.