Search code examples
jsonbashawksedhjson

Get JSON files from particular interval based on date field


I've a lot json file the structure of which looks like below:

{
  key1: 'val1'
  key2: {
          'key21': 'someval1',
          'key22': 'someval2',
          'key23': 'someval3',
          'date': '2018-07-31T01:30:30Z',
          'key25': 'someval4'
  }
  key3: []
  ... some other objects
 }          

My goal is to get only these files where date field is from some period. For example from 2018-05-20 to 2018-07-20. I can't base on date of creation this files, because all of this was generated in one day. Maybe it is possible using sed or similar program?


Solution

  • Fortunately, the date in this format can be compared as a string. You only need something to parse the JSONs, e.g. Perl:

    perl -l -0777 -MJSON::PP -ne '
       $date = decode_json($_)->{key2}{date};
       print $ARGV if $date gt "2018-07-01T00:00:00Z";
    ' *.json
    
    • -0777 makes perl slurp the whole files instead of reading them line by line
    • -l adds a newline to print
    • $ARGV contains the name of the currently processed file

    See JSON::PP for details. If you have JSON::XS or Cpanel::JSON::XS, you can switch to them for faster processing.

    I had to fix the input (replace ' by ", add commas, etc.) in order to make the parser happy.