Search code examples
jsonstreamhistogramjqsummary

generate field-value frequency count with jq


I can query all the unique values from a JSON field like so:

$ cat all.json | jq '.complianceState' | sort | uniq

"compliant"
"configManager"
"inGracePeriod"
"noncompliant"
"unknown"

And I can pedantically query the frequency count of each of these unique field values like so:

$ cat all.json | jq '.complianceState' | grep '^"configManager"$' | wc -l

116

Is there a way within jq to do this all in one shot to produce output like this:

{
    "compliant" : 123000,
    "noncompliant" : 2000,
    "configManager" : 116
}

Solution

  • From my standard library:

    # bag of words
    # WARNING: this is not collision-free!
    def bow(stream): 
      reduce stream as $word ({}; .[($word|tostring)] += 1);
    
    

    With this, you could use the filter:

    bow(inputs | .complianceState)
    

    in conjunction with the -n command-line option.

    In summary

    One way to pull all this together would be to place the above lines of jq in a file, say bow.jq, and invoke jq as follows:

    jq -n -f bow.jq all.json
    

    Another would be to use the module system -- see the jq manual and/or Cookbook for details.