Search code examples
jsonselectfilteringidentifierjq

Filter only specific keys from an external file in jq


I have a JSON file with the following format:

[
  {
    "id": "00001",
    "attr": {
      "a": "foo",
      "b": "bar",
      ...
    }
  },
  {
    "id": "00002",
    "attr": {
      ...
    },
    ...
  },
...
]

and a text file with a list of ids, one per line. I'd like to use jq to filter only the records whose ids are mentioned in the text file. I.e. if the list contains "00001", only the first one should be printed.

Note, that I can't simply grep since each record may have an arbitrary number of attributes and sub-attributes.


Solution

  • There are basically two ways to proceed:

    1. read the file of ids from STDIN
    2. read the JSON from STDIN

    Both are feasible, but here we illustrate (2) as it leads to a simple but efficient solution.

    Suppose the JSON file is named in.json and the list of ids is in a file named ids.txt like so:

    00001
    00010
    

    Notice that this file has no quotation marks. If it does, then the following can be significantly simplified as shown in the postscript.

    The trick is to convert ids.txt into a JSON array. With the above assumption about quotation marks, this can be done by:

    jq -R . ids.txt | jq -s .
    

    Assuming a reasonable shell, a simple solution is now at hand:

    jq --argjson ids "$(jq -R . ids.txt | jq -s .)" '
      map( select( .id as $id | $ids | index($id) ))' in.json
    

    Faster

    Assuming your jq has any/2, then a simpler and more efficient solution can be obtaining by defining:

    def isin($a): . as $in | any($a[]; $in == .);
    

    The required jq filter is then just:

    map( select( .id | isin($ids) ) )
    

    If these two lines of jq are put into a file named select.jq, the required incantation is simply:

    jq --argjson ids "$(jq -R . ids.txt | jq -s)" -f select.jq in.json
    

    Postscript

    If the index file consists of a stream of valid JSON texts (e.g., strings with quotation marks) and if your jq supports the --slurpfile option, the invocation can be further simplified to:

    jq --slurpfile ids ids.txt -f select.jq in.json 
    

    Or if you want everything as a one-liner:

    jq --slurpfile ids ids.txt 'map(select(.id as $id|any($ids[];$id==.)))' in.json