Search code examples
rubynokogiripass-by-referencepass-by-value

How to grab a node and work on it as a new object


I need to pull a fragment out of a large XML file and work only with that fragment.

xml = <<XMLEND
<CFRDOC xsi:noNamespaceSchemaLocation="CFRMergedXML.xsd">
    <TITLE>
        <SUBTITLE>
            <CHAPTER>
                <TOC></TOC>
                <PART></PART>
                <PART></PART>
                <PART>
                    <EAR>Pt. 1903</EAR>
                    <HD SOURCE="HED">PART 1903—INSPECTIONS, CITATIONS AND PROPOSED PENALTIES</HD>
                    <CONTENTS></CONTENTS>
                    <AUTH></AUTH>
                    <SOURCE></SOURCE>
                    <SECTION>section1</SECTION>
                    <SECTION>section2</SECTION>
                    <SECTION>section3</SECTION>
                    <SECTION>section4</SECTION>
                </PART>
            </CHAPTER>
        </SUBTITLE>
    </TITLE>
</CFRDOC>
XMLEND

doc = Nokogiri::HTML(xml)

section = doc.xpath("//section")

# I can grab a specific node...
section[3].text          
=> "section4"

# copy it 
temp = section[3].dup
=> #<Nokogiri::XML::Element:0x261ce64 name="section" children=[#<Nokogiri::XML::Text:0x261c98c "section4">]>

# but the variable still refers to the whole...
doc.xpath("//part").size
=> 3
section.xpath("//part").size
=> 3
temp.xpath("//part").size 
=> 3

Coming from a PHP background, I'm having to rethink variables a bit. I know variables are different in Ruby; they are pointers to an object.

Therefore, when I run temp.xpath, I'm actually running it on doc. But I'm wanting to grab a specific node and its children, and then work on it as a new object. This would narrow down the haystack immensely and make the rest of my job so much easier!

How do I create a new object using only the node I have selected? I want to turn section[3] into a new object that wouldn't see the other <part>'s and their associated <section> tags.


Solution

  • Use to_xml to turn temp back into an XML string, then use Nokogiri::XML again to get a new object.

    my_section = Nokogiri::XML(temp.to_xml)
    my_section.xpath('//part').size
    # => 0
    
    puts my_section
    # <?xml version="1.0"?>
    # <section><section4</section>
    

    (I'm not sure why you're using Nokogiri::HTML to begin with, but you may substitute that back in here for XML if you think you need to.)