Squarespace has a built-in RSS feed for all blogs, etc. that use its service, and you can view the RSS feed for any blog by appending ?format=rss to the end of the URL for the blog. For example, http://denverdarling.com/home is a blog through Squarespace, and you can view the RSS feed for that blog through http://denverdarling.com/home?format=rss
When you manually type in the URL for the RSS feed within a browser's address bar it shows the RSS contents without any trouble. However, when I try to pull the same contents with a PHP script, I get an error every time that says "HTTP request failed! HTTP/1.0 400 Bad Request"
I have tried a few different PHP functions to pull the content, but they all result in the same error. I have also tried this with several different Squarespace blogs, and again they all result in the same error. The PHP functions that I have tried include: file_get_contents
, fopen
, simplexml_load_file
, DOMDocument()->load()
, etc. Which all result in a "HTTP request failed! HTTP/1.0 400 Bad Request" error.
The only thing that I see when I google the topic is that you can't pull the RSS feed for a password protected blog, but since none of the blogs I've tried to pull the feeds for are password protected, I'm not sure what's going on.
It is possible that they are blocking headless user agents
$url = "http://denverdarling.com/home?format=rss";
$options = array(
'header'=>"Accept-language: en\r\n" .
"User-Agent: Mozilla/5.0 (iPad; U; CPU OS 3_2 like Mac OS X; en-us) AppleWebKit/531.21.10 (KHTML, like Gecko) Version/4.0.4 Mobile/7B334b Safari/531.21.102011-10-16 20:23:10\r\n" // i.e. An iPad
$context = stream_context_create($options);
$file = file_get_contents($url, false, $context);
this works, they or their host is checking the header in the request and filtering out particular things