Search code examples
csvyamlyq

How can I use (Go) yq to reduce a YAML document to a desired set of keys and produce a CSV?


I have the following YAML document:

222:
  description:
    en: "124098-en"
    fr: "498438-fr"
  name:
    en: "293878-en"
    fr: "222493878-fr"
  mass: 0.1
  groupID: "24902"
223:
  description:
    en: "124098-en"
    fr: "498438-fr"
  name:
    en: "zz325-en"
    fr: "222493878-fr"
  mass: 0.1
  groupID: "234988"
[many other records]

I would like to construct a CSV that looks like:

222,"293878-en","24902"
223,"zz325-en","234988"

That is, each row is just:

  • first field: the key of the map in the original document
  • second field: the .[].name.en from the original document
  • third field: the .[].groupID from the original document

No other fields are preserved in the CSV from the original document.

What's the right way to do this?

Addendum: I'm using the Go version of yq (4.7.1) but either the Go or the Python version is fine, or if that's not the right tool here, I'm happy to use something else.


Solution

  • The Python yq version is much more straightforward to use, because it literally uses jq under the hood to operate on the JSON converted from the YAML.

    You can use the jq's constructs and get the CSV result as

    yq -r 'keys_unsorted[] as $k | [ ($k|tonumber), (.[$k] | .name.en, .groupID) ] | @csv' yaml
    

    The @csv function puts the elements collected in the array to the native type as originally encoded in the source. If groupID is intended to be stored as a string, it could be done as .groupID | tostring


    Go yq was much unique prior to v4, when it used its own DSL, but now as of v4.8 its trying so hard to implement the functions of jq. It doesn't have a CSV function out-of-the-box yet.