Search code examples
jsonselectjqcontainsdel

jq filter to remove field value if contains element from list


This is somewhat similar to this: jq: how to filter an array of objects based on values in an inner array? but extended

I have a list of values I want to filter out but it is not 1:1 match. It's contained match.

Having such input (from file or from pipe):

{
  "namespace": "namespace1",
  "name": "some-pod1",
  "images": [
    "acr1.azurecr.io/some_project/some_project_image@sha256:xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
    "docker.io/istio/proxyv2@sha256:57621adeb78e67c52e34ec1676d1ae898b252134838d60298c7446d0964551cc"
  ],
  "initImages": [
    "docker.io/istio/proxyv2@sha256:57621adeb78e67c52e34ec1676d1ae898b252134838d60298c7446d0964551cc"
  ]
}
{
  "namespace": "namespace1",
  "name": "some-pod2",
  "images": [
    "acr1.azurecr.io/some_project/some_project_image@sha256:xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
    "docker.io/istio/proxyv2@sha256:57621adeb78e67c52e34ec1676d1ae898b252134838d60298c7446d0964551cc"
  ],
  "initImages": [
    "docker.io/istio/proxyv2@sha256:57621adeb78e67c52e34ec1676d1ae898b252134838d60298c7446d0964551cc"
  ]
}

and such query

jq -r '.[] as $pods | ["kube-system", "kube-public", "gatekeeper-system", "istio-system", "istio-operator"] as $excludedNamespaces | ["istio", "registry.k8s", "mcr.microsoft.com", "azurecr.io"] as $excludedImages | $pods | select((.initImages[] | contains("docker.io")) or (.images[] | contains("docker.io"))) | select(.namespace as $in | $excludedNamespaces | index($in) | not) | del(.images | select(contains($excludedImages))) ' pods.json

I need to do something like:

del(.images | select(contains("istio") or select(contains("registry.k8s") or ...))

Since I have already $excludedImages list I want to process it and if I would require to change it then I can do this in one place. I tried as above by passing whole list to contains fuction but it doesn't work as expected. I always see "istio" images in the output.

I just want to remove those values from list. And if images list is empty (same will apply to initImages) then I want to remove whole object from output.

The thing is that docker enabled some throttling on the docker hub and I want to find pods which are still not migrated to our own registry rather than docker hub directly. But there are some images which are excluded from throttling like for example istio. It can be pulled directly from docker hub. And that's the whole idea.

So I prepared lists of namespaces to exclude and images to exclude. Namespaces are 1:1 match so there's no issue but images... they contain those hashes so they will always change and I cannot provide it directly.

Your help will be pretty much appreciated.


Solution

  • I just want to remove those values from list. And if images list is empty (same will apply to initImages) then I want to remove whole object from output.

    For the select condition inside del, use any on all items instead of separate disjunctions with or. To delete the entire object in case of an empty array, a simple select will suffice, which defines what to keep. Here, I've also included any as you probably want to keep objects if either one of the arrays in images and initImages is non-empty (i.e. only delete if both are empty).

    Note that all items in your sample input unfortunately do contain either "istio" or "azurecr.io", so eventually both objects will be removed. If you update your sample data to better exemplify the different cases, I'll also adapt this response to it.

    ["istio", "registry.k8s", "mcr.microsoft.com", "azurecr.io"] as $excludedImages
    | del(.images[], .initImages[] | select(any(.; contains($excludedImages[]))))
    | select(any(.images, .initImages; . != []))
    

    Demo