Search code examples
phpjsonlarge-files

Processing large JSON files in PHP


I am trying to process somewhat large (possibly up to 200M) JSON files. The structure of the file is basically an array of objects.

So something along the lines of:

[
  {"property":"value", "property2":"value2"},
  {"prop":"val"},
  ...
  {"foo":"bar"}
]

Each object has arbitrary properties and does not necessary share them with other objects in the array (as in, having the same).

I want to apply a processing on each object in the array and as the file is potentially huge, I cannot slurp the whole file content in memory, decoding the JSON and iterating over the PHP array.

So ideally I would like to read the file, fetch enough info for each object and process it. A SAX-type approach would be OK if there was a similar library available for JSON.

Any suggestion on how to deal with this problem best?


Solution

  • I decided on working on an event based parser. It's not quite done yet and will edit the question with a link to my work when I roll out a satisfying version.

    EDIT:

    I finally worked out a version of the parser that I am satisfied with. It's available on GitHub:

    https://github.com/kuma-giyomu/JSONParser

    There's probably room for some improvement and am welcoming feedback.