Search code examples
jsondategroupingaggregatejq

JSON parsing - group by date range with JQ


I have the following JSON data :

{
  "data": [  
    {"date": "2018-08-22","value": 1},
    {"date": "2018-08-30","value": 2},
    {"date": "2018-09-01","value": 3},
    {"date": "2018-09-22","value": 3},
    {"date": "2018-09-28","value": 4},
    {"date": "2018-10-18","value": 5},
    {"date": "2018-10-23","value": 6}
  ]
}

I would like to use JQ to group data per month starting at the first value : My month would be :

  • 2018 08 22 to 2018 09 21

  • 2018 09 22 to 2018 10 21

  • 2018 10 22 to 2018 11 21

My expected output is the following :

{
  "data": [  
    {"month": "2018-08-22 to 2018-09-21","sum": 6},
    {"month": "2018-09-22 to 2018-10-21","sum": 12},
    {"month": "2018-10-23 to 2018-11-21","sum": 6}
  ]
}

How can I do that ?


Solution

  • A solution is easily obtained with the following helper function:

    # $day should be the day (an integer) that defines the cutoff point;
    # input should be a Y-M-D string
    def bucket($day):
      def prev: if .m > 1 then .m -= 1 else .m = 12 | .y -= 1 end;
      def fmt: if .m <= 9 then "\(.y)-0\(.m)" else "\(.y)-\(.m)" end;
      capture("(?<y>[0-9]+)-(?<m>[0-9]+)-(?<d>[0-9]+)")
      | map_values(tonumber)
      | if .d >= $day then . else prev end
      | fmt ;
    

    We now compute the bucket-count pairs:

    .data
    | (.[0].date | capture("(?<d>[0-9]+)$") | .d | tonumber) as $day
    | reduce .[] as $pair ({};
      .[$pair.date | bucket($day)] += ($pair.value) )
    

    With the sample data, this produces:

    {
      "2018-08": 6,
      "2018-09": 12,
      "2018-10": 6
    }      
    

    It is now a trivial matter to convert this into the desired format, and is thus left as an exercise.