For a Markdown document I want to filter out all sections whose header titles are not in the list to_keep
. A section consists of a header and the body until the next section or the end of the document. For simplicity lets assume that the document only has level 1 headers.
When I make a simple case distinction on whether the current element has been preceeded by a header in to_keep
and do either return None
or return []
I get an error. That is, for pandoc --filter filter.py -o output.pdf input.md
I get TypeError: panflute.dump needs input of type "panflute.Doc" but received one of type "list"
(code, example file and complete error message at the end).
I use Python 3.7.4 and panflute 1.12.5 and pandoc 2.2.3.2.
If make a more fine grained distinction on when to do return []
, it works (function action_working
). My question is, why is this more fine grained distinction neccesary? My solution seems to work, but it might well be accidental... How can I get this to work properly?
Traceback (most recent call last):
File "filter.py", line 42, in <module>
main()
File "filter.py", line 39, in main
return run_filter(action_not_working, doc=doc)
File "C:\Users\ody_he\AppData\Local\Continuum\anaconda3\lib\site-packages\panflute\io.py", line 266, in run_filter
return run_filters([action], *args, **kwargs)
File "C:\Users\ody_he\AppData\Local\Continuum\anaconda3\lib\site-packages\panflute\io.py", line 253, in run_filters
dump(doc, output_stream=output_stream)
File "C:\Users\ody_he\AppData\Local\Continuum\anaconda3\lib\site-packages\panflute\io.py", line 132, in dump
raise TypeError(msg)
TypeError: panflute.dump needs input of type "panflute.Doc" but received one of type "list"
Error running filter filter.py:
Filter returned error status 1
# English
Some cool english text this is!
# Deutsch
Dies ist die deutsche Übersetzung!
# Sources
Some source.
# Priority
**Medium** *[Low | Medium | High]*
# Status
**Open for Discussion** *\[Draft | Open for Discussion | Final\]*
# Interested Persons (mailing list)
- Franz, Heinz, Karl
from panflute import *
to_keep = ['Deutsch', 'Status']
keep_current = False
def action_not_working(elem, doc):
'''For every element we check if it occurs in a section we wish to keep.
If it is, we keep it and return None (indicating to keep the element unchanged).
Otherwise we remove the element (return []).'''
global to_keep, keep_current
update_keep(elem)
if keep_current:
return None
else:
return []
def action_working(elem, doc):
global to_keep, keep_current
update_keep(elem)
if keep_current:
return None
else:
if isinstance(elem, Header):
return []
elif isinstance(elem, Para):
return []
elif isinstance(elem, BulletList):
return []
def update_keep(elem):
'''if the element is a header we update to_keep.'''
global to_keep, keep_current
if isinstance(elem, Header):
# Keep if the title of a section is in too keep
keep_current = stringify(elem) in to_keep
def main(doc=None):
return run_filter(action_not_working, doc=doc)
if __name__ == '__main__':
main()
I think what happens is that panflute call the action on all elements, including the Doc
root element. If keep_current
is False
when walking the Doc
element, it will be replaced by a list. This leads to the error message you are seeing, as panflute expectes the root node to always be there.
The updated filter only acts on Header
, Para
, and BulletList
elements, so the Doc
root node will be left untouched. You'll probably want to use something more generic like isinstance(elem, Block)
instead.
An alternative approach could be to use panflute's load
and dump
elements directly: load the document into a Doc
element, manually iterate over all blocks in args
and remove all that are unwanted, then dump the resulting doc back into the output stream.
from panflute import *
to_keep = ['Deutsch', 'Status']
keep_current = False
doc = load()
for top_level_block in doc.args:
# do things, remove unwanted blocks
dump(doc)