Is there a simple approach to process an HTML file so that tags matching a certain CSS selector can be deleted? My motivation is that pandoc
generates HTML output that in my view is too verbose, surrounding any math expression with <span class="math inline"> ... </span>
, when generally ...
is enough. For display math the input and output tend to have line breaks, so maybe a dedicated tool would be better than grep
or similar. The goal is to reduce bandwidth usage, so anything client-side would be out.
Pandoc inserts those span tags to enable javascript libraries like mathjax to display the math properly... you can of course remove them with your html processing tool of choice, e.g. Nokogiri if you're using ruby, Put something like this in removespans.rb
:
require 'nokogiri'
doc = Nokogiri::HTML(File.open("file.html"))
doc.search('span').remove
puts doc
then execute:
pandoc -s -o file.html input.md
ruby removespans.rb > output.html