Search code examples
jsonslicejqdata-extractionjson-path-expression

How to filter out a JSON based on list of paths in JQ


Given an arbitrary JSON input:

{  
   "id":"038020",
   "title":"Teenage Mutant Ninja Turtles: Out of the Shadows",
   "turtles":[  
      {  
         "name":"Leonardo",
         "mask":"blue"
      },
      {  
         "name":"Michelangelo",
         "mask":"orange"
      },
      {  
         "name":"Donatello",
         "mask":"purple"
      },
      {  
         "name":"Raphael",
         "mask":"red"
      }
   ],
   "summary":"The Turtles continue to live in the shadows and no one knows they were the ones who took down Shredder",
   "cast":"Megan Fox, Will Arnett, Tyler Perry",
   "director":"Dave Green"
}

And an arbitrary list of JQ paths like [".turtles[].name", ".cast", ".does.not.exist"], or any similar format

How can I create new JSON with only the information contained in the paths of the list? In this case the expected result would be:

{  
   "turtles":[  
      {  
         "name":"Leonardo"
      },
      {  
         "name":"Michelangelo"
      },
      {  
         "name":"Donatello"
      },
      {  
         "name":"Raphael"
      }
   ],
   "cast":"Megan Fox, Will Arnett, Tyler Perry"
}

I've seen similar solutions in problems like "removing null entries" from a JSON using the walk function present in jq1.5+, somewhat along the lines of:

def filter_list(input, list):
 input
 | walk(  
     if type == "object" then
       with_entries( select(.key | IN( list )))
     else
       .
     end); 

filter_list([.], [.a, .b, .c[].d])

But it should take in account the full path in the JSON somehow.

What is the best approach to solve this problem?


Solution

  • If $paths contains an array of explicit jq paths (such as [ ["turtles", 0, "name"], ["cast"]]), the simplest approach would be to use the following filter:

    . as $in
    | reduce $paths[] as $p (null; setpath($p; $in | getpath($p)))
    

    Extended path expressions

    In order to be able to handle extended path expressions such as ["turtles", [], "name"], where [] is intended to range over the indices of the turtles array, we shall define the following helper function:

    def xpath($ary):
      . as $in
      | if ($ary|length) == 0 then null
        else $ary[0] as $k
        | if $k == []
          then range(0;length) as $i | $in[$i] | xpath($ary[1:]) | [$i] + .
          else .[$k] | xpath($ary[1:]) | [$k] + . 
          end
        end ;
    

    For the sake of exposition, let us also define:

    def paths($ary): $ary[] as $path | xpath($path);
    

    Then with the given input, the expression:

    . as $in
    | reduce paths([ ["turtles", [], "name"], ["cast"]]) as $p 
        (null; setpath($p; $in | getpath($p)) )
    

    produces the output shown below.

    Using path

    It is worth point out that one way to handle expressions such as ".turtles[].name" would be to use the builtin filter path/1.

    For example:

    # Emit a stream of paths:
    def paths: path(.turtles[].name), ["cast"];
    
    . as $in
    | reduce paths as $p (null; setpath($p; $in | getpath($p)))
    

    Output:

    {
      "turtles": [
        {
          "name": "Leonardo"
        },
        {
          "name": "Michelangelo"
        },
        {
          "name": "Donatello"
        },
        {
          "name": "Raphael"
        }
      ],
      "cast": "Megan Fox, Will Arnett, Tyler Perry"
    }