Search code examples
terraformkubernetes-helmterraform-cdk

cdktf not respecting multiline string for helm values file


I have a helm values file declared like so

role: agent
customConfig:
  data_dir: /vector-data-dir
  api:
    enabled: false
  sources:
    graph-node-logs:
      type: kubernetes_logs
      extra_label_selector: "graph-node-indexer=true"
  transforms:
    parse_graph_node_logs:
      inputs:
        - graph-node-logs
      type: remap
      source: >
        # Extract timestamp, severity and message
        # we ignore the timestamp as explained below
        regexed = parse_regex!(.message, r'^(?P<timestamp>\w\w\w \d\d \d\d:\d\d:\d\d\.\d\d\d) (?P<severity>[A-Z]+) (?P<message>.*)$')

        # From the message, extract the subgraph id (the ipfs hash)
        message_parts = split(regexed.message, ", ", 2)
        structured = parse_key_value(message_parts[1], key_value_delimiter: ":", field_delimiter: ",") ?? {}

        # construct the final fields we care about which are the subgraph id, the
        severity, the message and the unix timestamp
        final_fields = {}
        final_fields.subgraph_id = structured.subgraph_id
        final_fields.message = regexed.message
        final_fields.severity = regexed.severity

        # graph node does not emit time zone information and thus we can't use the timestamp we extract
        # because we can't coerce the extracted timestamp into an umabiguous timestamp. Therefore,
        # we use the timestamp of the log event instead as seen by the source plugin (docker-logs)
        final_fields.unix_timestamp = to_unix_timestamp!(.timestamp || now(), unit: "milliseconds")

        # Add the final fields to the root object. The clickhouse integration below will discard fields unknown to the schema.
        . |= final_fields
  sinks:
    stdout:
      type: console
      encoding:
        codec: json
      target: stdout
      inputs:
        - parse_graph_node_logs
service:
  enabled: false

I'm using this values file in a typescript cdktf construct like so

    const valuesAsset = new TerraformAsset(this, "vector-values", {
      path: `./${EnvConfig.name}-values.yaml`,
      type: AssetType.FILE,
    });

    new helm.Release(this, "vector", {
      repository: "https://helm.vector.dev",
      chart: "vector",
      name: "vector",
      version: "0.21.1",
      values: [Fn.file(valuesAsset.path)],
    });

However, when this values file gets piped into the helm chart which then creates a configmap, I see that the multiline value for transforms.parse_graph_node_logs.source is not respected:

 ❯❯❯ k get configmap -o yaml vector
apiVersion: v1
data:
  vector.yaml: |
    api:
      enabled: false
    data_dir: /vector-data-dir
    sinks:
      stdout:
        encoding:
          codec: json
        inputs:
        - parse_graph_node_logs
        target: stdout
        type: console
    sources:
      graph-node-logs:
        extra_label_selector: graph-node-indexer=true
        type: kubernetes_logs
    transforms:
      parse_graph_node_logs:
        inputs:
        - graph-node-logs
        source: |
          # Extract timestamp, severity and message # we ignore the timestamp as explained below regexed = parse_regex!(.message, r'^(?P<timestamp>\w\w\w \d\d \d\d:\d\d:\d\d\.\d\d\d) (?P<severity>[A-Z]+) (?P<message>.*)$')
          # From the message, extract the subgraph id (the ipfs hash) message_parts = split(regexed.message, ", ", 2) structured = parse_key_value(message_parts[1], key_value_delimiter: ":", field_delimiter: ",") ?? {}
          # construct the final fields we care about which are the subgraph id, the severity, the message and the unix timestamp final_fields = {} final_fields.subgraph_id = structured.subgraph_id final_fields.message = regexed.message final_fields.severity = regexed.severity
          # graph node does not emit time zone information and thus we can't use the timestamp we extract # because we can't coerce the extracted timestamp into an umabiguous timestamp. Therefore, # we use the timestamp of the log event instead as seen by the source plugin (docker-logs) final_fields.unix_timestamp = to_unix_timestamp!(.timestamp || now(), unit: "milliseconds")
          # Add the final fields to the root object. The clickhouse integration below will discard fields unknown to the schema. . |= final_fields
        type: remap
kind: ConfigMap
metadata:
  annotations:
    meta.helm.sh/release-name: vector
    meta.helm.sh/release-namespace: default
  creationTimestamp: "2023-05-18T14:10:45Z"
  labels:
    app.kubernetes.io/component: Agent
    app.kubernetes.io/instance: vector
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: vector
    app.kubernetes.io/version: 0.29.1-distroless-libc
    helm.sh/chart: vector-0.21.1
  name: vector
  namespace: default
  resourceVersion: "307030904"
  uid: d9fabff2-426d-47aa-8298-4935ddef1d42

How can I get cdktf to respect this multiline values file?


Solution

  • You've used the wrong kind of YAML block scalar. In your values file you need to specify

    #       not `>`
    source: |
      # Extract timestamp, severity and message
      ...
    

    The | creates a literal block scalar which preserves newlines. The > would create a folded block scalar which converts newlines to spaces in the way you're seeing.

    There's also Helm template setup to correctly render the multiline string. It's not clear if that's part of the problem here; if you're just using toYaml .Values.customConfig or something similar, it will get indented correctly.