Search code examples
shellparsingawkmarkdowntext-processing

Get markdown heading level in awk


Given a string with a markdown heading, what would be the best way to return the level of the heading in awk?

Assumptions:

  • For this scenario, to be considered a heading the only requisite is that the line must start with a #
  • The level of the heading is the number of #s before another character appears
  • If the string is not a heading, the program should return nothing
  • Must use awk not gawk

Example 1

Input:

# Heading

Expected output:

1

Example 2

Input:

## Heading

Expected output:

2

Example 3

## This is level #2

Expected output:

2

Example 4

Example without a leading #s in the provided string

This a normal paragraph with a # in the middle

Expected output:


Example 5

Example with leading blank character

 # Heading

Expected output:


Example 6

Example with leading \

\# Heading

Expected output:


Example 7

##Heading

Expected output:

2

Attempts

I tried using # as separator (FS) and NF to count the number of fields, but (of course) it doesn't know if it's a # indicating heading level or an ordinary # that is part of the title text.

echo '## Heading 2' | awk 'BEGIN{FS="#"} /^#/{print NF-1}'
# Returns 2 (right)
echo '## This is level #2' | awk 'BEGIN{FS="#"} /^#/{print NF-1}'
# Returns 3 (wrong, should be 2)

I also tried with gsub, but to no avail (same problem):

echo '## Heading 2' | awk '/^#/{gsub(/[^#]/,""); print length;}'
# Returns 2 (right)
echo '## This is level #2' | awk '/^#/{gsub(/[^#]/,""); print length;}'
# Returns 3 (wrong, should be 2)

Any insights?


Solution

  • What you're asking for is:

    awk 'match($0,/^#+/){print RLENGTH}'
    

    e.g.:

    $ cat file
    # Heading
    ## Heading
    ## This is level #2
    Example without a leading #s in the provided string
    This a normal paragraph with a # in the middle
     # Heading
    \# Heading
    ##Heading
    

    $ while IFS= read -r line; do
        echo "$line"
        echo "$line" | awk 'match($0,/^#+/){print RLENGTH}'
    done < file
    # Heading
    1
    ## Heading
    2
    ## This is level #2
    2
    Example without a leading #s in the provided string
    This a normal paragraph with a # in the middle
     # Heading
    \# Heading
    ##Heading
    2
    

    Do not really call awk 1 line at a time like this though as it's extremely inefficient and error prone, see why-is-using-a-shell-loop-to-process-text-considered-bad-practice, compared to just calling awk once:

    $ awk '{print} match($0,/^#+/){print RLENGTH}' file
    # Heading
    1
    ## Heading
    2
    ## This is level #2
    2
    Example without a leading #s in the provided string
    This a normal paragraph with a # in the middle
     # Heading
    \# Heading
    ##Heading
    2