Search code examples
jsongroup-byaggregatejq

Use jq to count elements based on group by


I have a list of movies like this:

[
   {
      "title":"X",
      "genres":[
         {
            "tag":"Horror"
         },
         {
            "tag":"Thriller"
         },
         {
            "tag":"Mystery"
         }
      ]
   },
   {
      "title":"Zero Dark Thirty",
      "genres":[
         {
            "tag":"Thriller"
         },
         {
            "tag":"Drama"
         },
         {
            "tag":"History"
         },
         {
            "tag":"War"
         }
      ]
   }
]

I want to query all unique genres and count the number of movies, where the output looks like this:

{
   "Horror":1,
   "Thriller":2,
   "Mystery":1,
   "Drama":1,
   "History":1,
   "War":1
}

Is this possible with jq?

Edit: I also need to be able to handle movies without genres. These can be ignored. For example:

[
   {
      "title":"Some movie",
      "genres":[
         {
            "tag": null
         }
      ]
   },
   {
      "title":"Some other movies",
      "genres":[
         {
            "tag":"Comedy"
         },
         {
            "tag":"Drama"
         }
      ]
   }
]

Solution

  • Yes, it is.

    1. Extract all genres into array
    2. Group genres
    3. Map to a key-value pair (key = any element of the group, we'll take the first; value = count of elements in the group)
    4. Build object from key-value pairs
    map(.genres[].tag)
    | group_by(.)
    | map({ key:first, value:length })
    | from_entries
    

    Output:

    {
      "Drama": 1,
      "History": 1,
      "Horror": 1,
      "Mystery": 1,
      "Thriller": 2,
      "War": 1
    }
    

    Alternatively, use a reduce based approach and simply increase a counter:

    reduce .[].genres[].tag as $genre ({}; .[$genre] += 1)
    

    This is likely more efficient than building an array and grouping.


    To select only non-null tags, filter through values:

    map(.genres[].tag | values)
    | group_by(.)
    | map({ key:first, value:length })
    | from_entries
    

    or

    reduce (.[].genres[].tag|values) as $genre ({}; .[$genre] += 1)
    

    values is internally defined as select(. != null).