Search code examples
selectjq

How to use 'select' properly in jq?


Given a json like this:

{

    "project" : [
        {
            "name" : "dungeon master",
            "date" : "2022-05-11T16:07:29.000Z",
            "status" : "active"
        },
        {
            "name" : "great gatsby",
            "date" : "2023-04-20T19:49:26.000Z",
            "status" : "active"
        },
        {
            "name" : "hundred years",
            "date" : "2022-11-29T23:37:29.000Z",
            "status" : "active"
        },
        {
            "name" : "passage to india",
            "date" : "2022-11-22T04:16:50.000Z",
            "status" : "active"
        }      
   
    ]
}

to get the key 'name' that has the string 'ma', I build up the jq using this chain of thought step by step:

.project[] | .name |contains("ma")
>>true
false
false
false

.project[] | select( .name |contains("ma"))
>>{
  "name": "dungeon master",
  "date": "2022-05-11T16:07:29.000Z",
  "status": "active"
}

.project[] | select( .name |contains("ma")) |.name
>> "dungeon master"
#success

but if I use the abbreviation way, it doesn't work:

.project[].name |contains("ma")
>>true
false
false
false

select(.project[].name |contains("ma"))
>>get back whole json 

select(.project[].name |contains("ma"))|.name 
>> null 

Using 'select' on the abbreviation '.array[].key' seems having totally different output. I think I missed some important understanding about jq here.

My questions:

  1. why select(.project[].name |contains("ma"))|.name doesn't work ? what is the wrong understanding behind that usage ?
  2. what is the different concept between .array[].key and .array[] | .key is wrong that cause the above not working ?

Solution

  • There's a common misconception that select would descend into the input structure, then "select from" those subordinate items the ones matching a condition, and return them. This notion probably comes from the function's name, for which "filter" or "asserting" could have been a better choice, because the structures involved in the evaluation of the condition, and the ones returned if the condition was met, are actually in no way related. In fact, while the former may descend arbitrarily deep (including not at all, or using variables with "external" values), the latter is always either the entire, unaltered input (if the condition was met), or nothing (in the sense of no output at all, just like with empty). Moreover, if the condition (just like with the input context) is a stream of values, the output, if produced, is still the input value but accordingly multiplied into a stream of values as well (actually, the single-value case is no exception to that, if you consider it being a stream of just one value). That said, let's answer your questions:

    1. why select(.project[].name |contains("ma"))|.name doesn't work ? what is the wrong understanding behind that usage ?

    Starting with the top-level as the current context, select will either reproduce it, or have no output at all, depending on the boolean evaluation of the condition. Then, at |, output that is present becomes the new (yet same) context, and is taken as input for .name, which either produces the value of that field, or null if there was no field of that name, or an error if the input was neither an object nor null itself.

    Given your sample data, the object {"project": …}, as context, .project[].name | contains("ma") produces true exactly once, so select reproduces its input, also once. However, as this object does not have a field called name, .name produces null, also once.

    As an exercise, try changing either the string values present in the fields, or the substring searched for, so that more than one item matches, and you will get as many null outputs - and likewise no output if there is no match.

    Now, instead of trying to extract .name from there, you could, of course, descend again to its actual location, but at this point you have lost the instances that conveyed where the condition was met. With your successful approach, .project[] | select( .name |contains("ma")) |.name, you

    • .project[]: descended into the array items, producing a stream of (in this case: four) objects as the new contexts,
    • select( .name |contains("ma")): reproduced the same objects if and only if their name field's value contained the substring given (here: one object makes it)
    • .name: extract the desired value in the right context.

    To avoid code duplication, you could even pull the extraction of .name out of, i.e. before the condition, making it already the context for select and the final output:

    jq -r '.project[].name | select(contains("ma"))'
    
    dungeon master
    

    Demo


    1. what is the different concept between .array[].key and .array[] | .key is wrong that cause the above not working ?

    As described above, | changes the context. If you had .array[].key1, .key2, .key2 would be evaluated in the same context as .array[].key1. With .array[] | .key1, .key2, both keys are evaluated in the same context of .array[] (which is a stream of probably multiple items).