Search code examples
jsonduplicatesjqkey-value

How do I add a new key to a jq array whenever a a key/value repeats itself


This is the input json. In this example json the key/value... "foo:bar" keeps repeating randomly. Order is not important eventhough it looks to be repeating alternately.

[
  {
    "foo": "bar",
    "id": "baz"
  },
  {
    "thud": "grunt",
    "id": "fum"
  },
  {
    "foo": "bar",
    "id": "noot"
  },
  {
    "zot": "toto",
    "id": "pluto"
  },
  {
    "foo": "bar",
    "id": "toto"
  }  
]

Whenever a key/value gets repeated, rather than removing it, would want to add an additional key/value into that particular element as shown below The desired output would be:

[
  {
    "foo": "bar",
    "id": "baz"
  },
  {
    "thud": "grunt",
    "id": "fum"
  },
  {
    "foo": "bar",
    "id": "noot",
    "desc": "1st duplicate found
  },
  {
    "zot": "toto",
    "id": "pluto"
  },
  {
    "foo": "bar",
    "id": "toto",
    "desc": "2nd duplicate found"
  } 
]

Again order and numbering is not relevant/required. Added it for articulation purposes only

Found several solution to remove duplicates but unable to make any headway to resolve this

Appreciate any proposed resolution for above

Thanks much for you time

Tried complex solution to split the json into two and merge with -n and argjson without much break through


Solution

  • Here's one approach using tostream and fromstream to deconstruct and reconstruct the input via stream representation, which is a stream of arrays containing a path and its corresponding value. A foreach loop iterates over this streams, replicating each item for later reconstruction. Additionally, it keeps track of each path-value pair reduced by the path's first item (matches occur irrelevant of their position in the original input array), and registers each appearance using a counter. If that is higher than one, also output another item (distinguished by adding _dup to the last path item) with the current count as value.

    fromstream(
      foreach (tostream | [., (.[0] |= .[1:] | @json)]) as [$s,$j] (
        {};
        if $s | has(1) then .[$j] += 1 end;
        if .[$j] > 1 then [($s[0] | last += "_dup"), .[$j]] else empty end,
        $s
      )
    )
    
    [
      {
        "foo": "bar",
        "id": "baz"
      },
      {
        "thud": "grunt",
        "id": "fum"
      },
      {
        "foo_dup": 2,
        "foo": "bar",
        "id": "noot"
      },
      {
        "zot": "toto",
        "id": "pluto"
      },
      {
        "foo_dup": 3,
        "foo": "bar",
        "id": "toto"
      }
    ]
    

    Demo