Search code examples
phpregexstrpos

php, strpos extract digit from string


I have a huge html code to scan. Until now i have been using preg_match_all to extract desired parts from it. The problem from the start was that it was extremely cpu time consuming. We finally decided to use some other method for extraction. I read in some articles that preg_match can be compared in performance with strpos. They claim that strpos beats regex scanner up to 20 times in efficiency. I thought i will try this method but i dont really know how to get started.

Lets say i have this html string:

<li id="ncc-nba-16451" class="che10"><a href="/en/star">23 - Star</a></li>
<li id="ncd-bbt-5674" class="che10"><a href="/en/moon">54 - Moon</a></li>
<li id="ertw-cxda-c6543" class="che10"><a href="/en/sun">34,780 - Sun</a></li>

I want to extract only number from each id and only text (letters) from content of a tags. so i do this preg_match_all scan:

'/<li.*?id=".*?([\d]+)".*?<a.*?>.*?([\w]+)<\/a>/s'

here you can see the result: LINK

Now if i would want to replace my method to strpos functionality how the approach would look like? I understand that strpos returns a index of start where match took place. But how can i use it to:

  • get all possible matches, not just one
  • extract numbers or text from desired place in string

Thank you for all the help and tips ;)


Solution

  • Using DOM

    $html = '
    <html>
    <head></head>
    <body>
    <li id="ncc-nba-16451" class="che10"><a href="/en/star">23 - Star</a></li>
    <li id="ncd-bbt-5674" class="che10"><a href="/en/moon">54 - Moon</a></li>
    <li id="ertw-cxda-c6543" class="che10"><a href="/en/sun">34,780 - Sun</a></li>
    </body>
    </html>';
    
    
    $dom_document = new DOMDocument();
    
    $dom_document->loadHTML($html);
    
    $rootElement = $dom_document->documentElement;
    
    $getId = $rootElement->getElementsByTagName('li');
    $res = [];
    foreach($getId as $tag)
    {
       $data = explode('-',$tag->getAttribute('id'));
       $res['li_id'][] = end($data);
    }
    $getNode = $rootElement->getElementsByTagName('a');
    foreach($getNode as $tag)
    {
       $res['a_node'][] = $tag->parentNode->textContent;
    }
    print_r($res);
    

    Output :

    Array
    (
        [li_id] => Array
            (
                [0] => 16451
                [1] => 5674
                [2] => c6543
            )
    
        [a_node] => Array
            (
                [0] => 23 - Star
                [1] => 54 - Moon
                [2] => 34,780 - Sun
            )
    
    )