Search code examples
shellawktree-traversal

Use awk to walk a tree expressed via indendation


spec:
  replicas: 1
  strategy:
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  selector:
    matchLabels:
      app.kubernetes.io/name: myapp
      app.kubernetes.io/instance: myapp

I would like to walk the tree with (POSIX) awk generating all paths to each key:

spec
spec:replicas
spec:strategy
spec:strategy:rollingUpdate
spec:strategy:rollingUpdate:maxSurge
spec:strategy:rollingUpdate:maxUnavailable
spec:selector
spec:selector:matchLabels
spec:selector:matchLabels:app.kubernetes.io/name
spec:selector:matchLabels:app.kubernetes.io/instance

Ideally in this depth-first search pre-order ordering.

I found this related question:

awk to insert after nth occurrence with indentation

But the solution is too far from what I'm after that I wasn't able to repurpose it with my pitiful knowledge of awk.

I've now written

match($0, /[^[:space:]]/) {
    arr[RSTART]=$1;
    for (i=1; i<RSTART; i+=1) {
        printf "%s", arr[i]
    };
    print sub(/:$/, "", arr[RSTART])
}

But the output is a bizarre

1
spec1
spec1
specstrategy1
specstrategyrollingUpdate1
specstrategyrollingUpdate1
spec1
specselector1
specselectormatchLabels1
specselectormatchLabels1

instead of what I was expecting. I think that's because sub is in-place replacement instead of outputting the new value? But I have no idea where the 1s come from.


Solution

  • I would harness GNU AWK for this task following way, let file.txt content be

    spec:
      replicas: 1
      strategy:
        rollingUpdate:
          maxSurge: 1
          maxUnavailable: 0
      selector:
        matchLabels:
          app.kubernetes.io/name: myapp
          app.kubernetes.io/instance: myapp
    

    then

    awk 'match($0,/[[:alpha:]]/){arr[RSTART]=$1;for(i=1;i<RSTART;i+=1){printf "%s",arr[i]};print gensub(/:/,"",1,arr[RSTART])}' file.txt
    

    gives output

    spec
    spec:replicas
    spec:strategy
    spec:strategy:rollingUpdate
    spec:strategy:rollingUpdate:maxSurge
    spec:strategy:rollingUpdate:maxUnavailable
    spec:selector
    spec:selector:matchLabels
    spec:selector:matchLabels:app.kubernetes.io/name
    spec:selector:matchLabels:app.kubernetes.io/instance
    

    Explanation: I use match string function to find position of (1st) alphabetic character, if it is found I set value in array arr under key being said position to 1st field, then for all values lower than position I output value from under key (not all must exists, but for non-existing printf "%s" is no-operation), then print value from under key being position with : removed. Disclaimer: this solution assumes space never appears in path part AND : appears exactly one at end of part AND path part always starts with letter.

    (tested in GNU Awk 5.1.0)