Search code examples
htmlmarkdownpandoc

Why does Pandoc swallow video HTML tags?


Here's a minimal example of my problem:

$ echo '<video><source src="filename.mp4" type="video/mp4"></video>' \
    | pandoc -f html -t html
> (empty output)

It seems that the problem comes in the parsing stage. If I remove the from type, pandoc happily passes the input through, only formatting it nicely. That might have been good enough, except I really need pandoc to parse the contents and include it when building the document tree, so that it is aware of necessary styling and such.

I tried this in their online sandbox as well, and see the following messages:

<video controls><source src="filename.mp4" type="video/mp4"></video>
---
> Skipped '<video controls>' at input line 1 column 1
> Skipped '<source src="filename.mp4" type="video/mp4">' at input line 1 column 17
> Skipped '</video>' at input line 1 column 61
(empty output)

So, basically, why is this tag being skipped?

What have I tried? I have tried variations on the input, like putting the video tag inside a paragraph and other things, but it always disappears.

I have also been fiddling with various flags, like --self-contained or --embed-resources, but I don't really know what they're trying to accomplish and they didn't work anyway. The final pandoc-command in my Makefile (the one currently swallowing the video-tags) has the --standalone flag, but that seems beside the point here.


Solution

  • First for the why: videos are not part of pandoc's internal document representation, so it is not entirely clear how that should be handled. Adding it as an image is reasonable, and you could raise a feature request for this.

    As an alternative to the nice <img> workaround mentioned above, one could also enable the raw_html format extension:

    pandoc -f html+raw_html -t html ...
    

    This will ensure that unknown elements are simply passed through unchanged.