Search code examples
jsonstreamjq

Get value of JSON object using jq --stream


I'm trying to extract the value of an JSON object using jq --stream, because the real data can the size of multiple GigaBytes.

This is the JSON I'm using for my tests, where I want to extract the value of item:

{
  "other": "content here",
  "item": {
    "A": {
      "B": "C"
    }
  },
  "test": "test"
}

The jq options I'm using:

jq --stream --null-input 'fromstream(inputs | select(.[0][0] == "item"))[]' example.json

However, I don't get any output with this command.

A strange thing I found is that when removing the object after the item the above command seems to work:

{
  "other": "content here",
  "item": {
    "A": {
      "B": "C"
    }
  }
}

The result looks as expected:

❯ jq --stream --null-input 'fromstream(inputs | select(.[0][0] == "item"))[]' example.json
{
  "A": {
    "B": "C"
  }
}

But as I cannot control the input JSON this is not the solution.

I'm using jq version 1.6 on MacOS.


Solution

  • You didn't truncate the stream, therefore after filtering it to only include the parts below .item, fromstream is missing the final back-tracking item [["item"]]. Either add it manually at the end (not recommended, this would also include the top-level object in the result), or, much simpler, use 1 | truncate_stream to strip the first level altogether:

    jq --stream --null-input '
      fromstream(1 | truncate_stream(inputs | select(.[0][0] == "item")))
    ' example.json
    
    {
      "A": {
        "B": "C"
      }
    }
    

    Alternatively, you can use reduce and setpath to build up the result object yourself:

    jq --stream --null-input '
      reduce inputs as $in (null;
        if $in | .[0][0] == "item" and has(1) then setpath($in[0];$in[1]) else . end
      )
    ' example.json
    
    {
      "item": {
        "A": {
          "B": "C"
        }
      }
    }
    

    To remove the top level object, either filter for .item at the end, or, similarly to truncate_stream, remove the path's first item using [1:] to strip the first level:

    jq --stream --null-input '
      reduce inputs as $in (null;
        if $in | .[0][0] == "item" and has(1) then setpath($in[0][1:];$in[1]) else . end
      )
    ' example.json
    
    {
      "A": {
        "B": "C"
      }
    }