Getting h1 from markdown via python's pandoc library

I'm writing a python batch script to process many markdown files to get h1-like text to generate 'title' metadata variable (I forgot to add 'title' into frontmatter). I'm not using this as pandoc filter.

Thus I was thinking to process those files via pandoc-python, but I'm not familiar with that and I cannot figure out how to get only h1.

content = pandoc.read(post.content)

'content' is pandoc native format. And I see something like this

(Pdb) content                                                                                                                                                                                                                                 
Pandoc(Meta({}), [Header(1, ('foobar', [], []), [Str('foobar:')]), Para(...

I would like to get h1 as simple text.

Solution

I have the following snippet that works for headers both with # or =======.

import pandoc
from pandoc.types import *

with open('README.md') as f:
    content = pandoc.read(f.read()) 
# But you can use your content.
headers = []

for elt in pandoc.iter(content):
     if isinstance(elt, Header):
         if elt[0] == 1: # this is header 1, remove this if statement if you want all headers.
             headers.append(elt[1][0])

Or if you want the exact string with upper case etc.:

for elt in pandoc.iter(content):
    if isinstance(elt, Header):
        if elt[0] == 1: # this is header 1, remove this if statement if you want all headers.
            header.append(pandoc.write(elt[-1]).strip())