How to pull headings from Google document using API

Currently trying to create a python script that will check a google document for various SEO onpage metrics.

The google docs API has a good sample showing how to extract ALL the text from a google document. However, this simply returns plain text with no formatting.

To perform my checks I need to be able to split out the H1, H2-H4, text in bold etc but after two hours of playing around/searching around the API docs/web, I can't figure out how to edit the following loop to be able to get (for example) all the HEADING_2 elements.

    text = ''
    for value in elements:
        if 'paragraph' in value:
            elements = value.get('paragraph').get('elements')
            for elem in elements:
                text += read_paragraph_element(elem)
        elif 'table' in value:
            # The text in table cells are in nested Structural Elements and tables may be
            # nested.
            table = value.get('table')
            for row in table.get('tableRows'):
                cells = row.get('tableCells')
                for cell in cells:
                    text += read_strucutural_elements(cell.get('content'))
        elif 'tableOfContents' in value:
            # The text in the TOC is also in a Structural Element.
            toc = value.get('tableOfContents')
            text += read_strucutural_elements(toc.get('content'))
    return text

Any help appreciated. Thanks.

Solution

I believe your goal and your current situation as follows.

You want to retrieve the texts of HEADING_2 of the paragraph style.
You want to achieve this using googleapis for python.
You want to achieve your goal using the script in your question.
You have already been get the values from Google Document using Docs API.

Modification point:

In this case, I thought that when the value of namedStyleType is HEADING_2, the text is required to be retrieved.

When this point is reflected to your script, it becomes as follows.

Modified script:

From:

for value in elements:
    if 'paragraph' in value:
        elements = value.get('paragraph').get('elements')

To:

for value in elements:
    if 'paragraph' in value and value['paragraph']['paragraphStyle']['namedStyleType'] == 'HEADING_2':  # Modified
        elements = value.get('paragraph').get('elements')

Reference:

NamedStyleType