Search code examples
phprsssquarespace

Are Squarespace RSS feeds blocked by PHP file pull requests?


Squarespace has a built-in RSS feed for all blogs, etc. that use its service, and you can view the RSS feed for any blog by appending ?format=rss to the end of the URL for the blog. For example, http://denverdarling.com/home is a blog through Squarespace, and you can view the RSS feed for that blog through http://denverdarling.com/home?format=rss

When you manually type in the URL for the RSS feed within a browser's address bar it shows the RSS contents without any trouble. However, when I try to pull the same contents with a PHP script, I get an error every time that says "HTTP request failed! HTTP/1.0 400 Bad Request"

I have tried a few different PHP functions to pull the content, but they all result in the same error. I have also tried this with several different Squarespace blogs, and again they all result in the same error. The PHP functions that I have tried include: file_get_contents, fopen, simplexml_load_file, DOMDocument()->load(), etc. Which all result in a "HTTP request failed! HTTP/1.0 400 Bad Request" error.

The only thing that I see when I google the topic is that you can't pull the RSS feed for a password protected blog, but since none of the blogs I've tried to pull the feeds for are password protected, I'm not sure what's going on.


Solution

  • It is possible that they are blocking headless user agents

    <?php
    
    $url = "http://denverdarling.com/home?format=rss";
    
    $options = array(
      'http'=>array(
        'method'=>"GET",
        'header'=>"Accept-language: en\r\n" .
                  "User-Agent: Mozilla/5.0 (iPad; U; CPU OS 3_2 like Mac OS X; en-us) AppleWebKit/531.21.10 (KHTML, like Gecko) Version/4.0.4 Mobile/7B334b Safari/531.21.102011-10-16 20:23:10\r\n" // i.e. An iPad 
      )
    );
    
    $context = stream_context_create($options);
    $file = file_get_contents($url, false, $context);
    
    var_dump($file);
    

    this works, they or their host is checking the header in the request and filtering out particular things