Search code examples
xmlyamlyq

yq expression to recursively force array


https://mikefarah.gitbook.io/yq/ can convert xml files to json.

data.xml which looks like:

<root>
  <heading>
    <member>one</member>
  </heading>
  <heading>
    <member>two</member>
    <member>three</member>
  </heading>
</root>

Running yq --input-format xml . data.xml outputs:

root:
  heading:
    - member: one
    - member:
        - two
        - three

member is outputted with two different types. After some digging I found:

"yq assumes consecutive nodes with the same name are assumed to be arrays. If there's only one node with a name, yq assumes its a map"

https://github.com/mikefarah/yq/issues/1583

So we can modify the query to be yq --input-format xml '.root.heading.[].member |= [] + .' data.xml which outputs:

root:
  heading:
    - member:
        - one
    - member:
        - two
        - three

However this requires passing the exact path and key name for every override, and assumes you know the hierarchy of nodes in advance.

I am handling files handcoded by users, and the nesting is different for every file. I need a more dynamic yq expression which can match multiple map keys at any level.

yq has a Recursive Descent (Glob) operator https://mikefarah.gitbook.io/yq/operators/recursive-descent-glob

So far I have a query: yq --input-format xml '(... | select(key == "member")) |= [] + .' data.xml which outputs:

root:
  heading:
    -? - member
      : - one
    -? - member
      : - two
        - three

However it does not output the result I would expect. Where am I going wrong?


Solution

  • In your example, you actually only want to glob the value nodes (then test if their key matches certain criteria), but you don't want to glob the key nodes themselves. Thus, use .. instead of ...:

    yq --input-format xml '(.. | select(key == "member")) |= [] + .' data.xml
    
    root:
      heading:
        - member:
            - one
        - member:
            - two
            - three