Search code examples
javajsongson

Best approach to parse huge (extra large) JSON file


I'm trying to parse some huge JSON file (like http://eu.battle.net/auction-data/258993a3c6b974ef3e6f22ea6f822720/auctions.json) using gson library (http://code.google.com/p/google-gson/) in JAVA.

I would like to know what is the best approach to parse this kind of big file (about 80k lines) and if you may know good API that can help me processing this.

Some ideas

  1. read line by line and get rid of the JSON format: but that's nonsense.
  2. reduce the JSON file by splitting this file into many other: but I did not find any good Java API for this.
  3. use this file directlly as nonSql database, keep the file and use it as my database.

Solution

  • You don't need to switch to Jackson. Gson 2.1 introduced a new TypeAdapter interface that permits mixed tree and streaming serialization and deserialization.

    The API is efficient and flexible. See Gson's Streaming doc for an example of combining tree and binding modes. This is strictly better than mixed streaming and tree modes; with binding you don't waste memory building an intermediate representation of your values.

    Like Jackson, Gson has APIs to recursively skip an unwanted value; Gson calls this skipValue().