Search code examples
yamljqyq

yq traverse complex values and convert to json file


I have a complex yaml and want to extract information from it using bash script.

The yaml is like:

content:
  images:
    sha256:4c8b96d4fffdfae29258d94a22ae4ad1fe36139d47288b8960d9958d1e63a9d0:
      annotations:
        kbld.carvel.dev/id: index.docker.io/dkalinin/k8s-simple-app
        kbld.carvel.dev/origins: |
          - resolved:
              tag: latest
              url: index.docker.io/dkalinin/k8s-simple-app
      image: image1
      imageType: Image
      layers:
      - digest: sha256:8ece9ac45f2b7228b2ed95e9f407b4f0dc2ac74f93c62ff1156f24c53042ba54
      origin: index.docker.io/dkalinin/k8s-simple-app@sha256:4c8b96d4fffdfae29258d94a22ae4ad1fe36139d47288b8960d9958d1e63a9d0
    sha256:4c8b96d4fffdfae29258d94a22ae4ad1fe36139d47288b8960d9958d1e63a9d1:
      annotations:
        kbld.carvel.dev/id: index.docker.io/dkalinin/k8s-simple-app
        kbld.carvel.dev/origins: |
          - resolved:
              tag: latest
              url: index.docker.io/dkalinin/k8s-simple-app
      image: image2
      imageType: Image
      layers:
      - digest: sha256:8ece9ac45f2b7228b2ed95e9f407b4f0dc2ac74f93c62ff1156f24c53042ba54
      - digest: sha256:8ece9ac45f2b7228b2ed95e9f407b4f0dc2ac74f93c62ff1156f24c53042ba5h
      - digest: sha256:8ece9ac45f2b7228b2ed95e9f407b4f0dc2ac74f93c62ff1156f24c53042ba6u
      origin: index.docker.io/dkalinin/k8s-simple-app@sha256:4c8b96d4fffdfae29258d94a22ae4ad1fe36139d47288b8960d9958d1e63a9d0
    sha256:4c8b96d4fffdfae29258d94a22ae4ad1fe36139d47288b8960d9958d1e63a9d2:
      annotations:
        kbld.carvel.dev/id: index.docker.io/dkalinin/k8s-simple-app
        kbld.carvel.dev/origins: |
          - resolved:
              tag: latest
              url: index.docker.io/dkalinin/k8s-simple-app
      image: image3
      imageType: Image
      layers:
      - digest: sha256:8ece9ac45f2b7228b2ed95e9f407b4f0dc2ac74f93c62ff1156f24c53042ba54
      origin: index.docker.io/dkalinin/k8s-simple-app@sha256:4c8b96d4fffdfae29258d94a22ae4ad1fe36139d47288b8960d9958d1e63a9d0

The expected result is:

{
"image1":
 ["8ece9ac45f2b7228b2ed95e9f407b4f0dc2ac74f93c62ff1156f24c53042ba54"], 
"image2": 
  ["8ece9ac45f2b7228b2ed95e9f407b4f0dc2ac74f93c62ff1156f24c53042ba54", 
   "8ece9ac45f2b7228b2ed95e9f407b4f0dc2ac74f93c62ff1156f24c53042ba5h", 
   "8ece9ac45f2b7228b2ed95e9f407b4f0dc2ac74f93c62ff1156f24c53042ba6u"], 
"image3": 
  ["8ece9ac45f2b7228b2ed95e9f407b4f0dc2ac74f93c62ff1156f24c53042ba54"]
}

The key image1, image2, image3 are the .content.images.image value.

The value is the sha256 part of all digests in the Layers as a string list.

What I tried is to first get all keys using images=$(yq -r '.content.images | keys' file) Then traverse the values using each key. But failed to traverse the values since the value is a complex one. Is there any simpler way to achieve this?

I'm using Go yq


Solution

  • You can use with_entries to modify each .key and .value. Or update each subitem .[] |= by modifying key and the context as value. Then, use sub with "" in the second argument to remove substrings by regex:

    yq -oj '.content.images | with_entries(
      .key = .value.image | .value |= [.layers[].digest | sub("^sha256:", "")]
    )'
    
    # or
    
    yq -oj '.content.images | .[] |= (
      key = .image | [.layers[].digest | sub("^sha256:", "")]
    )'
    
    {
      "image1": [
        "8ece9ac45f2b7228b2ed95e9f407b4f0dc2ac74f93c62ff1156f24c53042ba54"
      ],
      "image2": [
        "8ece9ac45f2b7228b2ed95e9f407b4f0dc2ac74f93c62ff1156f24c53042ba54",
        "8ece9ac45f2b7228b2ed95e9f407b4f0dc2ac74f93c62ff1156f24c53042ba5h",
        "8ece9ac45f2b7228b2ed95e9f407b4f0dc2ac74f93c62ff1156f24c53042ba6u"
      ],
      "image3": [
        "8ece9ac45f2b7228b2ed95e9f407b4f0dc2ac74f93c62ff1156f24c53042ba54"
      ]
    }