Search code examples
arraysjsonbashinner-joinjq

Merge json arrays with duplicate keys


I want to merge two json arrays with help of jq. Each object in arrays contains name field, which allow me to group by and merge two arrays into one.

LABELS

[
  {
    "name": "power_branch",
    "description": "master"
  },
  {
    "name": "test_branch",
    "description": "main"
  }
]

RUNNERS

[
  {
    "name": "power_branch",
    "runner": "power",
    "runner_tag": "macos"
  },
  {
    "name": "power_branch",
    "runner": "power",
    "runner_tag": "ubuntu"
  },
  {
    "name": "test_branch",
    "runner": "tester",
    "runner_tag": ""
  },
  {
    "name": "development",
    "runner": "dev",
    "runner_tag": "ubuntu"
  }
]

Desired Output

[
  {
    "name": "power_branch",
    "description": "master",
    "runner": "power",
    "runner_tag": "macos"
  },
  {
    "name": "power_branch",
    "description": "master",
    "runner": "power",
    "runner_tag": "ubuntu"
  },
  {
    "name": "test_branch",
    "description": "main",
    "runner": "tester",
    "runner_tag": ""
  }
]

I tried with following script, but power_branch entry was override, instead i want another entry with different runner_tag

#!/usr/bin/bash

LABELS='[{"name": "power_branch","description": "master"},{"name": "test_branch","description": "main"}]'
RUNNERS='''
[
  { "name": "power_branch", "runner": "power", "runner_tag": "macos" },
  { "name": "power_branch", "runner": "power", "runner_tag": "ubuntu" },
  { "name": "test_branch", "runner": "tester", "runner_tag": "" },
  { "name": "development", "runner": "dev", "runner_tag": "ubuntu" }
]
'''

FINAL=$(jq -s '[ .[0] + .[1] | group_by(.name)[] | select(length > 1) | add]' <(echo $LABELS) <(echo $RUNNERS))
echo $FINAL

OUTPUT

[
  {
    "name": "power_branch",
    "description": "master",
    "runner": "power",
    "runner_tag": "ubuntu"
  },
  {
    "name": "test_branch",
    "description": "main",
    "runner": "tester",
    "runner_tag": ""
  }
]

Solution

  • If you have two files labels.json and runners.json, you could read in the latter (runners) as a variable using --argjson and append to each element of the input array (labels) using map the corresponding fields determined by select.

    jq --argjson runners "$(cat runners.json)" '
      map(.name as $name | . + ($runners[] | select(.name == $name)))
    ' labels.json
    

    However, this reads the whole runners array into your shells command line space (--argjson takes two strings: a name and a value) which can easily overflow if the runners array gets big enough.

    Therefore, instead of using command substitution "$(…)", you could read in the runners file directly using either --slurpfile for the cost of another iteration level [][], or (despite the manual saying not to - read more about it in the comments) using --argfile with just a single iteration level as before:

    jq --slurpfile runners runners.json '
      map(.name as $name | . + ($runners[][] | select(.name == $name)))
    ' labels.json
    
    jq --argfile runners runners.json '
      map(.name as $name | . + ($runners[] | select(.name == $name)))
    ' labels.json
    

    To circumvent all these issues, @peak suggested using input for each file together with the -n option. Note that this requires the two files to be provided in this exact order as they are being read in sequentially.

    jq -n 'input as $runners | input |
      map(.name as $name | . + ($runners[] | select(.name == $name)))
    ' runners.json labels.json
    

    As the second input (labels) is passed on directly as the filter's main input (in contrast to runners, which is stored in a variable for later use), this could be further simplified by removing again the -n option (order of the files still matters):

    jq 'input as $runners |
      map(.name as $name | . + ($runners[] | select(.name == $name)))
    ' runners.json labels.json
    

    Finally, here's yet another approach using the SQL-style operators INDEX and JOIN which were introduced in jq v1.6. This also employs the technique using just one input and also the order of the files still matters as we need the runners array as the filter's primary input.

    jq '
      JOIN(INDEX(input[]; .name); .name) | map(select(.[1]) | add)
    ' runners.json labels.json