Search code examples
iterationjq

jq: iterate over regex matches in a string


I'm reworking some json using jq and trying to extract some strings from a larger description and move them into an array of related controls.

Here's my input json:

{"description": "Fail-safe procedures include, for example, alerting operator personnel and providing specific instructions on subsequent steps to take (e.g., do nothing, re-establish system settings, shut down processes, restart the system, or contact designated organizational personnel). Related controls: CA-2, CA-7, CM-3, CM-5, CM-8, MA-2, IR-4, RA-5, SA-10, SA-1x, SI-1x"}

The output I want is:

{"description": "Fail-safe procedures include, for example, alerting operator personnel and providing specific instructions on subsequent steps to take (e.g., do nothing, re-establish system settings, shut down processes, restart the system, or contact designated organizational personnel).",
"relatedControls": ["CA-2", "CA-7", "CM-3", "CM-5", "CM-8", "MA-2", "IR-4", "RA-5", "SA-10", "SA-1x", "SI-1x"}

I've worked out something I think is pretty close, but this is creating more objects instead of creating an array of controls like I wanted.

jq '. | {description: .description | sub(" Related controls:.*";""), relatedControls: .description | scan("[A-Z]{2}-\\d[0-9x]?") }'

Here's the whole thing on one line so it's easy to test:

echo '{"description": "Fail-safe procedures include, for example, alerting operator personnel and providing specific instructions on subsequent steps to take (e.g., do nothing, re-establish system settings, shut down processes, restart the system, or contact designated organizational personnel). Related controls: CA-2, CA-7, CM-3, CM-5, CM-8, MA-2, IR-4, RA-5, SA-10, SA-1x, SI-1x"}' | jq '. | {description: .description | sub(" Related controls:.*";""), relatedControls: .description | scan("[A-Z]{2}-\\d[0-9x]?") }'

jq wizards... what a I missing to get the output I'm after?


Solution

  • You could just split / at " Related controls: ", then split again at ", ":

    .description / " Related controls: "
    | {description: .[0], relatedControls: (.[1] / ", ")}
    

    Alternatively, here's another approach using capture and scan with your regular expressions:

    .description
    | capture("(?<description>.*) Related controls: (?<relatedControls>.*)")
    | .relatedControls |= [scan("[A-Z]{2}-\\d[0-9x]?")]
    

    Output:

    {
      "description": "Fail-safe procedures include, for example, alerting operator personnel and providing specific instructions on subsequent steps to take (e.g., do nothing, re-establish system settings, shut down processes, restart the system, or contact designated organizational personnel).",
      "relatedControls": [
        "CA-2",
        "CA-7",
        "CM-3",
        "CM-5",
        "CM-8",
        "MA-2",
        "IR-4",
        "RA-5",
        "SA-10",
        "SA-1x",
        "SI-1x"
      ]
    }