Search code examples
phprubydomhtml-parsingwatir-webdriver

How to traverse DOM (children/siblings) using watir-webdriver?


I'm used to using PHP's Simple HTML DOM Parser(SHDP) to access elements, but I'm using ruby now with watir-webdriver, and I'm wondering if this can replace the functionality of SHDP as far as accessing elements on pages goes.

So in SHDP I'd do this:

$ret = $html->find('div[id=foo]');

Which is an array of all instances of divs with id=foo. Oh, and $html is the HTML source of a specified URL. Anyway, so then I'd put it in a loop:

foreach($ret as $element) 
       echo $element->first_child ()->first_child ()->first_child ()->first_child ()->first_child ()->first_child ()->first_child ()->plaintext . '<br>';

Now, here, each ->first_child() is a child of the parent div with id=foo (notice I have seven) and then I print the plaintext of the 7th child. Something like this

<div id="foo">
    <div ...>
        <div ...>
            <div ...>
                <div ...>
                    <div ...>
                        <div ...>
                            <div ...>HAPPINESS</div>
                        </div>
                    </div>
                </div>
            </div>
        </div>
    </div
</div>

would get "HAPPINESS" printed. So, my question is, how can this be done using watir-webdriver (if it all possible)?

Also, and more generally, how can I get SHDP's DOM-traversing abilities in watir-webdriver:

enter image description here

I ask because if watir-webdriver can't do this, I'm going to have to figure out a way to pipe source of a browser instance in watir-webdriver to a PHP script that uses SHDP and get it that way, and somehow get it back to ruby with the relevant information...


Solution

  • Watir implements an :index feature (zero-based):

    browser.div(id: 'foo').divs           # children
    browser.div(id: 'foo').div(index: 6)  # nth-child
    browser.div(id: 'foo').parent         # parent
    browser.div(id: 'foo').div            # first-child
    browser.div(id: 'foo').div(index: -1) # last-child
    

    next_sibling and previous_sibling are not currently implemented, please make a comment here if you think it is necessary for your code: https://github.com/watir/watir/pull/270

    Note that in general you should prefer using indexes to using collections, but these also work:

    browser.div(id: 'foo').divs.first
    browser.div(id: 'foo').divs.last
    

    Paperback code example (are you looking to select by text or obtain the text?):

    browser.li(text: /Paperback/)  
    browser.td(class: "bucket").li
    browser.table(id: 'productDetailsTable').li
    

    We've also had requests in the past to support things like direct children instead of parsing all of the descendants: https://github.com/watir/watir/issues/329

    We're actively working on how we want to improve things in the upcoming versions of Watir, so if this solution does not work for you, please post a suggestion with your ideal syntax for accomplishing what you want here: https://github.com/watir/watir/issues and we'll see how we can support it.