Search code examples
jsongroupingjqdata-analysis

JQ: group input and and generate output json


Consider the below input

 {
   "name": "examplename1",
   "Date1": "value1",
   "Date2": "value2",
   "Date3": "value3"
}
 {
   "name": "examplename1",
   "Date1": "value4",
   "Date2": "value5",
   "Date3": "value6"
}
 {
   "name": "examplename2",
   "Date1": "value7",
   "Date2": "value8",
   "Date3": "value9"
}
 {
   "name": "examplename2",
   "Date1": "value10",
   "Date2":"value11",
   "Date3": "value12"
}

Require output as below

{
 "names": "examplename1",
 "availabledates1":[
  "value1",
  "value4"
 ],
 "availabledates2":[
  "value2",
  "value5"
 ],
 "availabledates3":[
  "value3",
  "value6"
 ]
}
{
 "names": "examplename2",
 "availabledates1":[
  "value7",
  "value10"
 ],
 "availabledates2":[
  "value8",
  "valu11"
 ],
 "availabledates3":[
  "value9",
  "value12"
 ]
}

Using JQ

[inputs] | group_by(.name)[] | [{names: .[].name, availabledates1: [.[].Date1], availabledates2: [.[].Date2], availabledates3: [.[].Date3]}] | unique_by(.names) | .[]

Getting output

{
  "names": "examplename1",
  "availabledates1": [
    "value4"
  ],
  "availabledates2": [
    "value5"
  ],
  "availabledates3": [
    "value6"
  ]
}
{
  "names": "examplename2",
  "availabledates1": [
    "value7",
    "value10"
  ],
  "availabledates2": [
    "value8",
    "value11"
  ],
  "availabledates3": [
    "value9",
    "value12"
  ]
}

Issue 1: This JQ ignores the first row in inputs.

Issue 2: If the input data set is very large this jq takes too much memory and eventually fails to execute as its doing multiple iterations which needs parallel threads.

Can refer this: https://jqplay.org/s/OOHAuv72GAL

Need here more efficient jq which does not fail on large data set and also considers first row in inputs.


Solution

  • Using group_by:

    jq -n '
      [inputs] | group_by(.name)[] | {
        names: first.name,
        avaliableDates1: map(.Date1),
        avaliableDates2: map(.Date2),
        avaliableDates3: map(.Date3)
      }
    '
    

    Demo

    Using reduce:

    jq -n '
      (reduce inputs as $i ({}; .[$i.name] |= (
        .names = $i.name
        | .avaliableDates1 += [$i.Date1]
        | .avaliableDates2 += [$i.Date2]
        | .avaliableDates3 += [$i.Date3]
      )))[]
    ' 
    

    Demo

    Output:

    {
      "names": "examplename1",
      "avaliableDates1": [
        "value1",
        "value4"
      ],
      "avaliableDates2": [
        "value2",
        "value5"
      ],
      "avaliableDates3": [
        "value3",
        "value6"
      ]
    }
    {
      "names": "examplename2",
      "avaliableDates1": [
        "value7",
        "value10"
      ],
      "avaliableDates2": [
        "value8",
        "value11"
      ],
      "avaliableDates3": [
        "value9",
        "value12"
      ]
    }