Search code examples
phprssquery-stringsimplexmlurl-parsing

Get url value in querystring of each entry link in Google News RSS XML for Facebook Sharer


Hi I'm using simpleXML to display a news.google.com feed.

The displayed entries link to the original article in this way:

http://news.google.com/news/url?sa=t&fd=R&ct2=us&usg=AFQjCNEcqhcp4AfUzgxc2l1gumydaxQ-KQ&clid=c3a7d30bb8a4878e06b80cf16b898331&cid=52778832126843&ei=keFLVfiHGvDVmQL5_4GgBg&url=http://WEBSITEWITHNEWS.COM/ARTICLEURLHERE

I need the entries to link to this instead: http://WEBSITEWITHNEWS.COM/ARTICLEURLHERE

The reason is that Facebook Sharer cannot interpret the following link:

https://www.facebook.com/sharer/sharer.php?u=http://news.google.com/news/url?sa=t&fd=R&ct2=us&usg=AFQjCNEcqhcp4AfUzgxc2l1gumydaxQ-KQ&clid=c3a7d30bb8a4878e06b80cf16b898331&cid=52778832126843&ei=keFLVfiHGvDVmQL5_4GgBg&url=http://WEBSITEWITHNEWS.COM/ARTICLEURLHERE

Facebook Sharer needs it to look like this:

https://www.facebook.com/sharer/sharer.php?u=http://WEBSITEWITHNEWS.COM/ARTICLEURLHERE

Is there a way that I can use regex (str_replace or preg_match) to remove the Google redirect URL so that social sharing sites can recognize the link?

The Google redirect URL is dynamic and so it will be slightly different each time and so I will need something that can replace each variant.

My working, functional code:

    $feed = file_get_contents("https://news.google.com/news/feeds?q=KEYWORD&output=rss");
$xml = new SimpleXmlElement($feed);
foreach ($xml->channel->item as $entry){
  $date = $entry->pubDate; 
  $date = strftime("%m/%d/%y %I:%M:%S%P", strtotime($date));
  $desc = $entry->description;
  $desc = str_replace("and more »", "","$desc");
  $desc = str_replace("font-size:85%", "font-size:100%","$desc");
  ?>
  <div class="item"></div>
  <?php echo $desc; ?>
  <div class="date">
  <?php echo $date; ?></div>
  <?php } ?>
 $desc = $entry->description;
 $date = $entry->pubDate; 
 $date = strftime("%A, %m/%d/%Y, %H:%M:%S", strtotime($date));
 $desc = str_replace("and more »","x","and more »");
  echo $date; 
  echo $desc;
  }

I'm using $desc to display the link instead of $link, but URL to the article with the Google redirectURL is still in $link if you would like to str_replace or preg_match $link instead of $desc

Link to working Google News feed below: https://news.google.com/news/feeds?q=KEYWORD&output=rss


Solution

  • You could use the built-in PHP functions parse_url (split URL into components) and parse_str (get parameter values from query string) for this:

    $feed = file_get_contents(
        "https://news.google.com/news/feeds?q=KEYWORD&output=rss"
    );
    $xml = new SimpleXmlElement($feed);
    
    foreach ($xml->channel->item as $entry){
        // Get query part of link
        $query = parse_url($entry->link, PHP_URL_QUERY);
    
        // Parse query parameters into $params array
        parse_str($query, $params);
    
        // Get URL from parameters
        $url = $params['url'];
    
        // Just output in this example
        echo "URL: $url", PHP_EOL;
    
        // ... Do some more stuff
    }
    

    Output:

    URL: http://www.gamasutra.com/blogs/JonathanRaveh/20150506/242840/Death_of_the_app_keyword__whats_next.php
    URL: http://www.business2community.com/online-marketing/8-keyword-optimization-tips-perfect-ppc-campaigns-01222200
    URL: http://searchengineland.com/marry-keywords-compelling-content-218174
    ...