Search code examples
htmlmarkdownpandoc

Is there how to stop Pandoc from wrapping every line as a paragraph and from nesting the image into the list?


Good morning!

I was following the tutorial at Pandoc: avoid paragraphs or add css class to paragraph?, but it does not solve my problem. I also gave a look for another question: Disable pandoc convert the image’s alt text to a paragraph when docx to markdown.

Here is the small code:

1. Lorem ipsum dolor sit amet consectetur adipisicing elit. Hic, reprehenderit.
2. Lorem ipsum dolor sit, amet consectetur adipisicing elit:

    ![example](assets/images/iuacessos-preferences.png)

3. Lorem ipsum dolor sit amet consectetur adipisicing elit. Enim voluptates similique ab doloremque delectus veniam.

I ran the following:

pandoc bug.md -f markdown_github+fenced_divs-implicit_figures-native_divs+raw_html -t html -o bug.md

Here is the output:

<ol>
  <li>
    <p>Lorem ipsum dolor sit amet consectetur adipisicing elit. Hic,
      reprehenderit.</p>
  </li>
  <li>
    <p>Lorem ipsum dolor sit, amet consectetur adipisicing elit:</p>
    <p><img src="assets/images/iuacessos-preferences.png" alt="example" /></p>
  </li>
  <li>
    <p>Lorem ipsum dolor sit amet consectetur adipisicing elit. Enim
      voluptates similique ab doloremque delectus veniam.</p>
  </li>
</ol>

You can see that Pandoc adds p element in every line, in every element, including the li element. It also added p among the img element, and nested p + img inside the li element.

The code should be like:

<ol>
  <li>Lorem ipsum dolor sit amet consectetur adipisicing elit. Hic, reprehenderit.</li>
  <li>Lorem ipsum dolor sit, amet consectetur adipisicing elit:</li>
    <img src="assets/images/iuacessos-preferences.png" alt="example" />
  <li>Lorem ipsum dolor sit amet consectetur adipisicing elit. Enim voluptates similique ab doloremque delectus veniam.</li>
</ol>

It is elegant and clean. Differently, GitHub has exactly this same output, doesn't wrap every line with a p element, and doesn't nest the image inside the li element.

Observe that I use mostly markdown_github because it supports more features than other Pandoc Markdown variants.


Solution

  • Note that recent pandoc versions say Deprecated: markdown_github. Use gfm instead.

    So what you should be using is:

    pandoc -f gfm -o bug.html bug.md
    

    which will use the exact same markdown parser that github itself uses.

    Note that the HTML you posted under "the code should be like" is invalid, since an <ol> can only have <li> as direct children. Perhpas you meant:

    <ol>
      <li>Lorem ipsum dolor sit amet consectetur adipisicing elit. Hic, reprehenderit.</li>
      <li>
        Lorem ipsum dolor sit, amet consectetur adipisicing elit:
        <img src="assets/images/iuacessos-preferences.png" alt="example" />
      </li>
      <li>Lorem ipsum dolor sit amet consectetur adipisicing elit. Enim voluptates similique ab doloremque delectus veniam.</li>
    </ol>
    

    For which pandoc -f html -t gfm gives the correct markdown:

    1.  Lorem ipsum dolor sit amet consectetur adipisicing elit. Hic,
        reprehenderit.
    2.  Lorem ipsum dolor sit, amet consectetur adipisicing elit:
        ![example](assets/images/iuacessos-preferences.png)
    3.  Lorem ipsum dolor sit amet consectetur adipisicing elit. Enim
        voluptates similique ab doloremque delectus veniam.
    

    If you're wondering why you get the <p> around the image:

    From the MANUAL:

    A paragraph is one or more lines of text followed by one or more blank lines.

    And why you get the <p> around the list items:

    A bullet list is a list of bulleted list items. A bulleted list item begins with a bullet (*, +, or -). Here is a simple example:

    * one
    * two
    * three
    

    This will produce a “compact” list. If you want a “loose” list, in which each item is formatted as a paragraph, put spaces between the items:

    * one
    
    * two
    
    * three