Search code examples
htmlparsingimage-extraction

Extracting *relevant* image from a web-page


I have a couple of twitter-powered news aggregation website. I have been planning to add images from articles that I find on twitter.

If I download the page and extract image using <img> tag, I get a bunch of images; not all of them relevant to the article. For example, images of button, icons, ads etc are captured. How do I extract the image accompanying the article? I know there is a solution -- Facebook link sharer does this pretty well.

Mithun

Duplicate of : How to find and extract "main" image in website


Solution

  • It's been a long time. But this may help next time.

    You can use this API https://urlmeta.org/

    It's very simple to use and result is the best we need.

    example for using API:

    <?php
    $url = "http://timesofindia.indiatimes.com/business/india-business/Raghuram-Rajan-not-fit-to-be-RBI-Governor-Subramanian-Swamy/articleshow/52236298.cms";
    
    $result = file_get_contents('https://api.urlmeta.org/?url='.$url);
    $array = json_decode($result,1);
    print_r($array['meta']['image']);
    
    ?>
    

    And that's the result you needed.