Search code examples
pandoc

How to retain HTML styles after conversion to docx with Pandoc


I have an html file that goes like this:

<!DOCTYPE html>
<html>
<head>
<style>
h1 {text-align:center;}
p {text-align:center;}
</style>
</head>
<body>

<h1>My heading</h1>
<p>Some poetry here.</p>

</body>
</html>

And I want to convert it to docx in pandoc. I tried with the usual command

pandoc -s test.html -o test.docx

And the text is correctly rendered, but it is not centered. I am automatically generating hundreds of htmls so a manual fix isn't in the budget. Basically I need to have some paragraphs left-aligned (the default) and some centered, since they are poetry. How can this be achieved?

Thank you.

PS: I could also use markdown as the input language instead of Html.


Solution

  • You need to customize a docx template and apply the template when converting HTML into docx. In your case, <h1> is converted into Heading 1 in Word, and <p> is converted into First Paragraph.

    Steps:

    1. Create a docx template.

      pandoc -o custom-reference.docx --print-default-data-file reference.docx

    2. Open custom-reference.docx and modify Styles.

      1. Center Heading 1
      2. Center First Paragraph
    3. Save custom-reference.docx

    4. Convert.

      pandoc input.html -o output.docx --reference-doc custom-reference.docx