Search code examples
phpparsingblogspot

Parsing BlogId from Blogspot.com in PHP using Regex


How can i get the blogid from a given blogspot.com url? I looked at the source code of the webpage from a blogspot.com it looks like this

<link rel="EditURI" type="application/rsd+xml" title="RSD" href="http://www.blogger.com/rsd.g?blogID=4899870735344410268" />

how can i parse this to get the number 4899870735344410268


Solution

  • Use DOMDocument to parse the document and then use its methods to retrieve the wanted element.

    I cannot stress this enough: never use regular expressions to parse an HTML document.

    function getBlogId($url) {
      $ch = curl_init($url);
      curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
      curl_setopt($ch, CURLOPT_AUTOREFERER, 1);
      curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
      curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
      $page = curl_exec ($ch);
      curl_close($ch);
    
      $doc = new DOMDocument();
      @$doc->loadHTML($page);
    
      $links = $doc->getElementsByTagName('link');
    
      foreach($links as $link) {
        $rel = $link->attributes->getNamedItem('rel');
    
        if($rel && $rel->nodeValue == 'EditURI') {
          $href = $link->attributes->getNamedItem('href')->nodeValue;
          $query = parse_url($href, PHP_URL_QUERY);
    
          if($query) {
            $queryComp = array();
            parse_str($query, $queryComp);
    
            if($queryComp['blogID']) {
              return $queryComp['blogID'];
            }
          }
        }
      }
    
      return false;
    }
    

    Example use:

    $id = getBlogId('http://thehouseinmarrakesh.blogspot.com/');
    echo $id; // 483911541311389592