Search code examples
jsonnestedkeyjq

Extract schema of nested JSON object


Let's assume this is the source json file:

{    
    "name": "tom",
    "age": 12,
    "visits": {
        "2017-01-25": 3,
        "2016-07-26": 4,
        "2016-01-24": 1
    }
}

I want to get:

[
  "age",
  "name",
  "visits.2017-01-25",
  "visits.2016-07-26",
  "visits.2016-01-24"
]

I am able to extract the keys using: jq '. | keys' file.json, but this skips nested fields. How to include those?


Solution

  • With your input, the invocation:

    jq 'leaf_paths | join(".")'
    

    produces:

    "name"
    "age"
    "visits.2017-01-25"
    "visits.2016-07-26"
    "visits.2016-01-24"
    

    If you want to include "visits", use paths. If you want the result as a JSON array, enclose the filter with square brackets: [ ... ]

    If your input might include arrays, then unless you are using jq 1.6 or later, you will need to convert the integer indices to strings explicitly; also, since leaf_paths is now deprecated, you might want to use its def. The result:

    jq 'paths(scalars) | map(tostring) | join(".")'
    

    allpaths

    To include paths to null, you could use allpaths defined as follows:

    def allpaths:
      def conditional_recurse(f):  def r: ., (select(.!=null) | f | r); r;
      path(conditional_recurse(.[]?)) | select(length > 0);
    

    Example:

    {"a": null, "b": false} | allpaths | join(".")
    

    produces:

    "a"
    "b"
    

    all_leaf_paths

    Assuming jq version 1.5 or higher, we can get to all_leaf_paths by following the strategy used in builtin.jq, that is, by adding these definitions:

    def allpaths(f):
      . as $in | allpaths | select(. as $p|$in|getpath($p)|f);
    
    def isscalar:
      . == null or . == true or . == false or type == "number" or type == "string";
    
    def all_leaf_paths: allpaths(isscalar);
    

    Example:

    {"a": null, "b": false, "object":{"x":0} } | all_leaf_paths | join(".")
    

    produces:

    "a"
    "b"
    "object.x"