I need to validate the a URL and get the title. To do this I curl the URL then extract the title using a regex. However sometimes a site might be down or might not be 'curl-able'. For example if you curl http://arsenal.com
, it returns with This site has permanently moved to http://www.arsenal.com
.
I could write a regex to check if the returned text contains something like 'site' , 'moved', url. But that sounds stupid and overly complicated.
However If I type http://arsenal.com into a web-browser it its automatically redirected to www.arsenal.com. How do they do this? What suggest you people of the internet?
Try curl -L
. The -L
switch causes curl to follow redirects if the server responds that the location has moved. The browser accomplishes this automatically by looking at the response code (in this case, 3XX) and then looking for the following header and redirecting to the value:
Location: newsite.com
I'm not sure how to use that switch from the PHP wrapper for curl though, I'm not a PHP guy. I would assume there's a straightforward way.