Search code examples
amazon-cloudfrontgoogle-search-consolegoogle-crawlerssearch-engine-bots

Google not indexing a geo restricted page distributed by CloudFront


I have a website hosted on AWS using CloudFront that, due to some legal limitations, needs to be only accessible in the UK and Ireland.

After setting up geo restriction in CloudFront and submitting the domain to Google via the Webmaster Tools (at the beginning of last week - 2 Jan) I noticed that the website has not yet been indexed or even recognized by Google (the search for the domain or site:mysite.co.uk does not result in anything).

My thinking is that it is due to the fact that Google crawler, trying to access the page from the US servers, is redirected to the generic error page saying that the site is intended for the UK and Ireland only and then refuses to index it as it seems like a very low quality website.

Has anyone came across a similar problem and found the solution?

I am planning to submit a sitemap to Google Webmaster Tools to see if that can help but also thinking if robots.txt file would help to solve this issue.

If you agree, any advise on the rules I should put in there? I was always using this file to simply let the crawlers know which parts of the website to exclude from indexing.

Any advise would be super helpful.

Thank you in advance,

Adam


Solution

  • Moving to Web Application Framework (WAF) worked. It gives you more control over what traffic is allowed to access the site. We just put the IPs for Google/Facebook and Twitter crawlers that can be found online.