Search code examples
jsonbashcommand-line-interfacejq

Why doesn't the jq filter 'exp as $var | .' print the original input as documented?


jq's documentation says (emphasis mine):

The expression exp as $x | ... means: for each value of expression exp, run the rest of the pipeline with the entire original input, and with $x set to that value.

Here's my own interpretation: in exp as $x | ..., the ...'s context starts at the original input to the jq command. If that's the case, why doesn't exp as $x | . print the original input? Here's my example (playground link):

Filter: (.signal[] | select(.name | contains("signal"))) as $matches | .

Input:

{
  "noise1": 5,
  "signal": [
    {
      "name": "child-signal-1",
      "nested": {
        "prop": "child-prop-3"
      }
    },
    {
      "name": "child-noise-2",
      "nested": {
        "prop": "child-prop-3"
      }
    },
    {
      "name": "child-signal-3",
      "nested": {
        "prop": "child-prop-3"
      }
    }
  ],
  "noise2": [
    {
      "name": "child-signal-1",
      "nested": {
        "prop": "child-prop-3"
      }
    },
    {
      "name": "child-noise-2",
      "nested": {
        "prop": "child-prop-3"
      }
    },
    {
      "name": "child-signal-3",
      "nested": {
        "prop": "child-prop-3"
      }
    }
  ]
}

Output:

{
  "noise1": 5,
  "signal": [
    {
      "name": "child-signal-1",
      "nested": {
        "prop": "child-prop-3"
      }
    },
    {
      "name": "child-noise-2",
      "nested": {
        "prop": "child-prop-3"
      }
    },
    {
      "name": "child-signal-3",
      "nested": {
        "prop": "child-prop-3"
      }
    }
  ],
  "noise2": [
    {
      "name": "child-signal-1",
      "nested": {
        "prop": "child-prop-3"
      }
    },
    {
      "name": "child-noise-2",
      "nested": {
        "prop": "child-prop-3"
      }
    },
    {
      "name": "child-signal-3",
      "nested": {
        "prop": "child-prop-3"
      }
    }
  ]
}
{
  "noise1": 5,
  "signal": [
    {
      "name": "child-signal-1",
      "nested": {
        "prop": "child-prop-3"
      }
    },
    {
      "name": "child-noise-2",
      "nested": {
        "prop": "child-prop-3"
      }
    },
    {
      "name": "child-signal-3",
      "nested": {
        "prop": "child-prop-3"
      }
    }
  ],
  "noise2": [
    {
      "name": "child-signal-1",
      "nested": {
        "prop": "child-prop-3"
      }
    },
    {
      "name": "child-noise-2",
      "nested": {
        "prop": "child-prop-3"
      }
    },
    {
      "name": "child-signal-3",
      "nested": {
        "prop": "child-prop-3"
      }
    }
  ]
}

I'd also like to understand why my output prints the input twice, thereby making it invalid json. I think that relates to (.signal[] | select(.name | contains("signal"))) returning 2 results, but I don't understand how that's relevant if I'm not referencing the variable on the RHS.


Solution

  • Simpler example:

    printf '%s\n' '[ "abc", "def" ]' | jq -c '.[] as $x | .'
    
    ["abc","def"]
    ["abc","def"]
    

    We could also write this as follows:

    printf '%s\n' '[ "abc", "def" ]' | jq -rc '.[] as $x | ". is " + tojson'
    
    . is ["abc","def"]
    . is ["abc","def"]
    

    You say . isn't producing the original input, but we clearly see it is in this latest snippet.

    What you're really actually asking about is why the expression that follows the | is evaluated multiple times. This results from the use of .[].

    If you use the .[index] syntax, but omit the index entirely, it will return all of the elements of an array. Running .[] with the input [1,2,3] will produce the numbers as three separate results, rather than as a single array. A filter of the form .foo[] is equivalent to .foo | .[].

    When something produces multiple values, the rest of the pipeline is evaluated for each of those values.[1]

    Some filters produce multiple results, for instance there's one that produces all the elements of its input array. Piping that filter into a second runs the second filter for each element of the array. Generally, things that would be done with loops and iteration in other languages are just done by gluing filters together in jq.

    This is repeated in as's documentation.

    The expression exp as $x | ... means: for each value of expression exp, run the rest of the pipeline with the entire original input, and with $x set to that value. Thus as functions as something of a foreach loop.

    Let's look at a slightly more elaborate version of my original version:

    printf '%s\n' '[ "abc", "def" ]' | jq -c '.[] as $x | [ ., $x ]'
    
    [["abc","def"],"abc"]
    [["abc","def"],"def"]
    

    The documentation I've quoted says this should be equivalent to the following:

    printf '%s\n' '[ "abc", "def" ]' | jq -c '
       . as $orig |  # Save original input.
       .[] |         # Produces multiple values.
       .  as $x |    # This is evaluated for each of those values.
       $orig |       # Set `.` to the original input.
       [ ., $x ]
    '
    
    [["abc","def"],"abc"]
    [["abc","def"],"def"]
    

    1. And when something produces no values, the rest of the pipeline isn't evaluated.