I need to pull a fragment out of a large XML file and work only with that fragment.
xml = <<XMLEND
<CFRDOC xsi:noNamespaceSchemaLocation="CFRMergedXML.xsd">
<TITLE>
<SUBTITLE>
<CHAPTER>
<TOC></TOC>
<PART></PART>
<PART></PART>
<PART>
<EAR>Pt. 1903</EAR>
<HD SOURCE="HED">PART 1903—INSPECTIONS, CITATIONS AND PROPOSED PENALTIES</HD>
<CONTENTS></CONTENTS>
<AUTH></AUTH>
<SOURCE></SOURCE>
<SECTION>section1</SECTION>
<SECTION>section2</SECTION>
<SECTION>section3</SECTION>
<SECTION>section4</SECTION>
</PART>
</CHAPTER>
</SUBTITLE>
</TITLE>
</CFRDOC>
XMLEND
doc = Nokogiri::HTML(xml)
section = doc.xpath("//section")
# I can grab a specific node...
section[3].text
=> "section4"
# copy it
temp = section[3].dup
=> #<Nokogiri::XML::Element:0x261ce64 name="section" children=[#<Nokogiri::XML::Text:0x261c98c "section4">]>
# but the variable still refers to the whole...
doc.xpath("//part").size
=> 3
section.xpath("//part").size
=> 3
temp.xpath("//part").size
=> 3
Coming from a PHP background, I'm having to rethink variables a bit. I know variables are different in Ruby; they are pointers to an object.
Therefore, when I run temp.xpath
, I'm actually running it on doc
. But I'm wanting to grab a specific node and its children, and then work on it as a new object. This would narrow down the haystack immensely and make the rest of my job so much easier!
How do I create a new object using only the node I have selected? I want to turn section[3]
into a new object that wouldn't see the other <part>
's and their associated <section>
tags.
Use to_xml
to turn temp
back into an XML string, then use Nokogiri::XML
again to get a new object.
my_section = Nokogiri::XML(temp.to_xml)
my_section.xpath('//part').size
# => 0
puts my_section
# <?xml version="1.0"?>
# <section><section4</section>
(I'm not sure why you're using Nokogiri::HTML
to begin with, but you may substitute that back in here for XML
if you think you need to.)