Search code examples
phphtmldomxml-parsingsimple-html-dom

Does Simple HTML Dom support :has like parsing?


I have to parse an HTML structure like this:

<div class='container>
    <div class='inner-div'>
        <span class='text'>...</span>
        <div class='author'>
            <span data-author='Alpha'>...</span>
        </div>
        <div class='summary'>
            <span data-summary='Exclusive'>Text 1</span>
        </div>
    </div>
    <div class='inner-div'>
        <span class='text'>...</span>
        <div class='author'>
            <span data-author='Beta'>...</span>
        </div>
        <div class='summary'>
            <span data-summary='Non-Exclusive'>Text 2</span>
        </div>
    </div>
    <div class='inner-div'>
        <span class='text'>...</span>
        <div class='author'>
            <span data-author='Gamma'>...</span>
        </div>
        <div class='summary'>
            <span data-summary='Exclusive'>Text 3</span>
        </div>
    </div>
    <div class='inner-div'>
        <span class='text'>...</span>
        <div class='author'>
            <span data-author='Delta'>...</span>
        </div>
        <div class='summary'>
            <span data-summary='Non-Exclusive'>Text 4</span>
        </div>
    </div>
    ...
    <div class='inner-div'>
        <span class='text'>...</span>
        <div class='author'>
            <span data-author='Zeta'>...</span>
        </div>
        <div class='summary'>
            <span data-summary='Exclusive'>Text 5</span>
        </div>
    </div>
</div>

I wish to obtain the first 'Exclusive' summary where author is not 'Alpha'. In the above example it would be 'Text 3'. How can I parse this using Simple HTML DOM or even XML DOM?

ADDENDUM: I am looking for parsing the HTML using PHP Simple HTML Dom library. I know how to parse it in jQuery, but Simple HTML Dom library doesn't seem to support any equivalent for (:has).


Solution

  • No, but here's a simple html dom replacement that does (you want :not instead of :has btw):

    include_once('advanced_html_dom.php');
    
    $html = str_get_html($str);
    
    echo $html->find('.author:not(> [data-author=Alpha]) ~ .summary > [data-summary=Exclusive]', 0);