Process huge GEOJson file with jq

Given a GEOJson file as follows:-

{
  "type": "FeatureCollection",
  "features": [
   {
     "type": "Feature",
     "properties": {
     "FEATCODE": 15014
  },
  "geometry": {
    "type": "Polygon",
    "coordinates": [
     .....

I want to end up with the following:-

{
  "type": "FeatureCollection",
  "features": [
   {
     "tippecanoe" : {"minzoom" : 13},
     "type": "Feature",
     "properties": {
     "FEATCODE": 15014
  },
  "geometry": {
    "type": "Polygon",
    "coordinates": [
     .....

ie. I have added the tippecanoe object to each feature in the array features

I can make this work with:-

 jq '.features[].tippecanoe.minzoom = 13' <GEOJSON FILE> > <OUTPUT FILE>

Which is fine for small files. But processing a large file of 414Mb seems to take forever with the processor maxing out and nothing being written to the OUTPUT FILE

Reading further into jq it appears that the --stream command line parameter may help but I am completely confused as to how to use this for my purposes.

I would be grateful for an example command line that serves my purposes along with an explanation as to what --stream is doing.

Solution

A one-pass jq-only approach may require more RAM than is available. If that is the case, then a simple all-jq approach is shown below, together with a more economical approach based on using jq along with awk.

The two approaches are the same except for the reconstitution of the stream of objects into a single JSON document. This step can be accomplished very economically using awk.

In both cases, the large JSON input file with objects of the required form is assumed to be named input.json.

jq-only

jq -c  '.features[]' input.json |
    jq -c '.tippecanoe.minzoom = 13' |
    jq -c -s '{type: "FeatureCollection", features: .}'

jq and awk

jq -c '.features[]' input.json |
   jq -c '.tippecanoe.minzoom = 13' | awk '
     BEGIN {print "{\"type\": \"FeatureCollection\", \"features\": ["; }
     NR==1 { print; next }
           {print ","; print}
     END   {print "] }";}'

Performance comparison

For comparison, an input file with 10,000,000 objects in .features[] was used. Its size is about 1GB.

u+s:

jq-only:              15m 15s
jq-awk:                7m 40s
jq one-pass using map: 6m 53s