Search code examples
jsonjq

Using jq to include data taken from two filtered array elements into a single output object


Given input JSON like this (there's a lot more to it really, but I've stripped the fields that aren't of any interest:

{
  "modules": {
    "data": [
      {
        "id": "aod_play_area",
        "data": [
          {
            "titles": {
              "primary": "Primary",
              "secondary": "Secondary"
            }
          }
         ]
      },
      {
        "id": "aod_tracks",
        "data": [
          {
            "titles": {
              "primary": "First Artist name here",
              "secondary": "First Track title here"
            },
            "uris": [
              {
                "id": "commercial-music-service-spotify",
                "uri": "https://open.spotify.com/track/1234567890"
              },
              {
                "id": "commercial-music-service-apple",
                "uri": "https://music.apple.com/gb/album/xyz/1234?i=9876"
              }
            ]
          },
          {
            "titles": {
              "primary": "Second Artist name here",
              "secondary": "Second Track title here"
            },
            "uris": [
              {
                "id": "commercial-music-service-spotify",
                "label": "Spotify",
                "uri": "https://open.spotify.com/track/555555555555"
              },
              {
                "id": "commercial-music-service-apple",
                "label": "Apple Music",
                "uri": "https://music.apple.com/gb/album/abc/5555?i=5555"
              }
            ]
          }
        ]
      }
    ]
  }
}

... and desired output which has two top-level properties, each populated from different elements within the modules.data[] array, indexed by their .id:

{
    "title": "Primary - Secondary",
    "tracks": [
       {
          "title": "First Track title",
          "artist": "First Artist name",
          "start": 3645,
          "end": 3820,
          "apple": "https://music.apple.com/gb/album/xyz/1234?i=9876",
          "spotify": "https://open.spotify.com/track/1234567890"
        },
       {
          "title": "Second Track title",
          "artist": "Second Artist name",
          "start": 3645,
          "end": 3820,
          "apple": "https://music.apple.com/gb/album/abc/5555?i=5555",
          "spotify": "https://open.spotify.com/track/555555555555"
        }
    ]
}

... what should my jq query look like to pull data from those two objects within modules.data? I can write queries to do one or the other, but not both, presumably because my first query has caused jq to walk down one branch of the structure and I don't know how to make it "unwind" so that the second query still works.

Extracting the titles:

cat sample.json | jq '.modules.data.[] | {
    title: select(.id == "aod_play_area").data[0] | "\(.titles.primary) - \(.titles.secondary)",
    tracks: []
}'

Produces:

{
  "title": "Primary - Secondary",
  "tracks": []
}

Extracting just the tracks:

cat sample.json | jq '.modules.data.[] | {
    title: "title",
    tracks: select(.id == "aod_tracks").data | map({
        title: .titles.primary,
        artist: .titles.secondary,
        start: .offset.start,
        end: .offset.end,
        apple:   .uris[] | select(.id =="commercial-music-service-apple").uri,
        spotify: .uris[] | select(.id =="commercial-music-service-spotify").uri
    })
}'

Produces:

{
  "title": "title",
  "tracks": [
    {
      "title": "First Artist name here",
      "artist": "First Track title here",
      "start": null,
      "end": null,
      "apple": "https://music.apple.com/gb/album/xyz/1234?i=9876",
      "spotify": "https://open.spotify.com/track/1234567890"
    },
    {
      "title": "Second Artist name here",
      "artist": "Second Track title here",
      "start": null,
      "end": null,
      "apple": "https://music.apple.com/gb/album/abc/5555?i=5555",
      "spotify": "https://open.spotify.com/track/555555555555"
    }
  ]
}

Combining the two:

cat sample.json | jq '.modules.data.[] | {
    title: select(.id == "aod_play_area").data[0] | "\(.titles.primary) - \(.titles.secondary)",
    tracks: select(.id == "aod_tracks").data | map({
        title: .titles.primary,
        artist: .titles.secondary,
        start: .offset.start,
        end: .offset.end,
        apple:   .uris[] | select(.id =="commercial-music-service-apple").uri,
        spotify: .uris[] | select(.id =="commercial-music-service-spotify").uri
    })
}'

... produces no output at all. I believe this is because the first select has taken us down one "branch" of the outer-most data, so the second select doesn't find what it's looking for (as children of where it's ended up down that first branch). How should I rewrite my query to successfully extract all of the data of interest?

(I'm new to jq, so apologies if I've misused any terminology)


Solution

  • You need to work on .modules.data instead of .modules.data[]:

    jq '.modules.data | 
        { title: (.[] | select(.id == "aod_play_area").data[0] | "\(.titles.primary) - \(.titles.secondary)"),
          tracks: (.[] | select(.id == "aod_tracks").data | map({
            title: .titles.primary,
            artist: .titles.secondary,
            start: .offset.start,
            end: .offset.end,
            apple:   .uris[] | select(.id =="commercial-music-service-apple").uri,
            spotify: .uris[] | select(.id =="commercial-music-service-spotify").uri
        }))
        }' sample.json
    

    When you work on .modules.data[], you filter takes as input .modules.data[0], then .modules.data[1], so tries to construct two objects with missing information:

    { title: ..., tracks: empty }
    { title: empty, tracks: ... }
    

    As each one contains empty, which means the overall result is empty.