I have an XHTML file with an HTML5 Shiv in the head:
<!--[if lt IE 9]>
<script src='../common/html5.js' type='text/javascript'></script>
<![endif]-->
Using Nokogiri I need to adjust the path in that comment, stripping off the ../
. However, any changes to the .content
of the comment node results in XML output that converts the >
and <
to entities:
XML = <<ENDXML
<r><!--[if lt IE 9]>
<script src='../common/html5.js' type='text/javascript'></script>
<![endif]--></r>
ENDXML
require 'nokogiri'
doc = Nokogiri.XML(XML)
comment = doc.at_xpath('//comment()')
comment.content = comment.content.sub('../','')
puts comment.to_xml
#=> <!--[if lt IE 9]>
#=> <script src='common/html5.js' type='text/javascript'></script>
#=> <![endif]-->
The original source is valid XML/XHTML. How can I get Nokogiri not to convert the entities inside this comment during tweaking?
The Nokogiri docs for content=
say:
The string gets XML escaped, not interpreted as markup.
So rather than using that, you could replace the node with a new one, using replace
and an explicitly created comment node:
XML = <<ENDXML
<r><!--[if lt IE 9]>
<script src='../common/html5.js' type='text/javascript'></script>
<![endif]--></r>
ENDXML
require 'nokogiri'
doc = Nokogiri.XML(XML)
comment = doc.at_xpath('//comment()')
# this line is the new one, replacing comment.content= ...
comment.replace Nokogiri::XML::Comment.new(doc, comment.content.sub('../',''))
# note `comment` is the old comment, so to see the changes
# look at the whole document
puts doc.to_xml
Output is:
<?xml version="1.0"?>
<r>
<!--[if lt IE 9]>
<script src='common/html5.js' type='text/javascript'></script>
<![endif]-->
</r>