in my github-flavoured markdown webkalk.md file I have a line:
<span custom-style="OS">something</span>
In reference.docx for pandoc I declared a style "OS".
When I generate my .docx with a command:
pandoc -s webkalk.md > webkalk.docx -f markdown -t docx --reference-doc="reference.docx"
the word something
is styled in the way that I intended (style "OS"), but when I try command:
pandoc -s webkalk.md > webkalk.docx -f gfm -t docx --reference-doc="reference.docx"
it is styled just like the plain text.
Is it possible to use custom styles for docx in Github-Flavoured Markdown?
gfm
does not include support for the native_spans extension. Pandoc's default markdown
includes support for most of the extensions Pandoc provides, including native_spans
, by default.
However, as the documentation explains:
Note, however, that
commonmark
andgfm
have limited support for extensions. Only those listed below (andsmart
,raw_tex
, andhard_line_breaks
) will work. The extensions can, however, all be individually disabled. Also,raw_tex
only affectsgfm
output, not input.
gfm
(GitHub-Flavored Markdown)pipe_tables, raw_html, fenced_code_blocks, auto_identifiers, gfm_auto_identifiers, backtick_code_blocks, autolink_bare_uris, space_in_atx_header, intraword_underscores, strikeout, task_lists, emoji, shortcut_reference_links, angle_brackets_escapable, lists_without_preceding_blankline.
By way of explanation, the native_spans
and native_divs extensions parse the raw HTML and convert it into Pandoc's native internal format. That allows the content and any associated attributes to be passed to the output format, if the output format includes support. However, without the extension, any output format which does not support HTML directly will only get the plain text content of the raw HTML, which is the behavior you are seeing.
commonmark
and gfm
each are defined with strict specifications, so it appears that Pandoc does not allow much divergence from those strict specs. Therefore, the native_spans
and native_divs
extensions are not supported when using the gfm
format.
The documentation warns about this:
Because pandoc’s intermediate representation of a document is less expressive than many of the formats it converts between, one should not expect perfect conversions between every format and every other. ... While conversions from pandoc’s Markdown to all formats aspire to be perfect, conversions from formats more expressive than pandoc’s Markdown can be expected to be lossy.
The important thing to remember here is that "pandoc's Markdown" (the markdown
format) is the only format which is guaranteed to not be "lossy." The gfm
format is not "pandoc's Markdown" and therefore does not carry that guarantee.
That said, it might seem like the native_spans
extension should be supported by gfm
, even if it is not enabled by default. However, the Commonmark spec (which GFM extends), completely reworked how raw HTML is parsed. Presumably, Pandoc needed to redefine the methods which parse raw HTML for commonmark
and gfm
formats. Therefore, the extensions which work in raw HTML would not work with the alternate parser methods. In other words, any extensions which operate on raw HTML, including native_spans
, would need to be rewritten to work with the commonmark
and gfm
formats. Until that happens, those extensions are not available when using those formats. Whether Pandoc plans to add support in the future or not is not information I am privy to and would be out of scope for this discussion.