Search code examples
htmlcssperlmojo-dom

CSS selection using Mojo::DOM


This is a multidisciplinary question so the answer may not be purely CSS.

I am parsing a large table and my goal is to retrieve only the text outside of the <b></b> tags. I am able to select the rows but stuck on how to only select text outside of the bold tag.

HTML

<div id="tab1">
<table width='650' class='subtblfont'>
    <tr><td>&nbsp;</td></tr> 
    <tr><td>&nbsp;</td></tr>        
    <tr>
        <td><b>Check-in Date:&nbsp;</b>04/20/2013</td>
        <td><b>Check-in Date:&nbsp;</b>04/25/2013</td>
    </tr>
</table>

Code

$row_content = $results_dom->find('div#tabs-1 tr:nth-child(3) td');

foreach (@$row_content) {
    print "$_\n";
}

Output

<td><b>Check-in Date:&nbsp;</b>04/20/2013</td>
<td><b>Check-in Date:&nbsp;</b>04/25/2013</td>

Desired Output

04/20/2013
04/25/2013

I am able to use regular expressions to pull out the text but that is not an ideal solution at this point. Is there a way to select only the non-bold text?


Solution

  • From the Documentation:

    text

    Extract text content from this element only (not including child elements).

    Try giving this a shot:

    (Granted I don't really know perl, so if I got the syntax wrong... sorry)

    $row_content = $results_dom->find('div#tabs-1 tr:nth-child(3) td')->each(sub { say $_->text})