Search code examples
phpregexdomdocument

PHP Extract data between specific tags from an html file


So I have a PHP script, which displays an html page. What I need to do, is to extract the innerHTML of a specific element, below I'll show the exact thing that I need to extract

So, what I need to extract is the 0.0225 sequence. Here is a fragment from an HTML file:

<tr>
    <td>Income</td>
    <td id="income">
        <font color="green">
            <span data-c="2250000">0.0225 RP</span>
        </font>
    </td>
</tr>

I tried parsing it with RegEx (I know that it is not recommended but I tried it) and I didn't got nothing. I've tried different DOM implementations for PHP, but the result was the same. I do not know what I can else do, so I'm asking how can I extract those numbers, for further editing, and placing them back...

So, here are my attempts:

The attempt with RegEx:

$html = file_get_contents('the link');    
$regex = '#<td id="income"><font color="green"><span data-c="[.*]">(.*?) BTC</span></font></td>#';
if (preg_match($regex, $html)){echo yay;};

The attempt with DOM:

$html = file_get_contents('the link');    
$dom = new DOMDocument();
$dom->load($html);
$element = $dom->getElemetById("income")->innerHTML;

Solution

  • It's not worth going into why your regex doesn't work, IMO (for general regex knowledge though .... a . doesn't count for new lines (unless s modifier is used) and .* in a character class is allowing either of those 2 literal characters).

    For the domdocument you need to get further into the DOM tree to get the value. You can use the xpath for this.

    $html = '<tr>
        <td>Income</td>
        <td id="income">
            <font color="green">
                <span data-c="2250000">0.0225 RP</span>
            </font>
        </td>
    </tr>';
    $dom = new domdocument();
    $dom->loadHTML($html);
    $xpath = new DOMXPath($dom);
    echo $xpath->query('//tr/td[@id="income"]/font/span')[0]->nodeValue;