Search code examples
csvjqfrequency-distribution

using jq to output in tsv unique values of an element with frequency of occurrence


given the following json, how could i create a tsv, that would output the frequencies of the values found in it (in the new element)?

{"_id":{"$someid":"545"},"new":"813 ROT","old":{"$someid":"5fe"}}
{"_id":{"$someid":"659889a"},"new":"9878 SU","old":{"$someid":"5bc"}}
{"_id":{"$someid":"986978754b"},"new":"813 ROT","old":{"$someid":"5bc1af"}}

desired output:

813 ROT 2
9878 SU 1

i can extract the values with .new, but do not know how to group them and create the tsv then

what i have tried:

def counter(stream):   reduce stream as $s ({}; .[$s|tostring] += 1);  counter(inputs | .new) | to_entries[] | {ItemId: (.key), Count: .value}

but gives output:

{
  "ItemId": "9878 SU",
  "Count": 1
}
{
  "ItemId": "813 ROT",
  "Count": 1
}

i use version of jq found in ubuntu 16.04 repositories


Solution

  • You can try this jq filter:

    jq -sr 'group_by(.new)|.[]|[.[0].new, length]|@tsv'
    813 ROT 2
    9878 SU 1
    

    Option -s allows to get full content of the file into an array (to apply the filter to the full content).

    Option -r displays raw data.

    The group_by function groups into an array all object having the same new key.

    The .[0].new gets the value of the key new and length the number of key with the same value.

    The operator @tsv formats into tab separated values.