Search code examples
javascripthttp-redirectcloudflaregooglebotcloudflare-workers

Trying to redirect country specific users without affecting search engine crawlers


I'm trying prevent US visitors from accessing non-US pages by redirecting them using a Cloudflare Worker, however, I still want to allow search engines to crawl the entire site so they will bypass the redirect.

Cloudflare gives me the visitors country and if the visitor is a verified bot.

The site has all US page URLs prefixed with /us/ so I can perform a basic regex to check what part of the site they are trying to access.

So this is what I have implemented...

export default {
  async fetch(request) {

    // Get the visitors country code.
    // @link https://developers.cloudflare.com/workers/runtime-apis/request/
    const visitorCountry = request.cf?.country;

    // Get the bot management status.
    // @link https://developers.cloudflare.com/bots/reference/bot-management-variables/#workers-variables
    // @link https://radar.cloudflare.com/traffic/verified-bots
    const requestIsVerifiedBot = request?.cf?.botManagement?.verifiedBot;

    const requestUrl = new URL(request.url);
    const requestUrlIsUs = requestUrl.pathname.match(/^\/us\/?$|^\/us\/.*$/i)?.length;

    // If the visitor is from the US, and they are accessing a non-US page, and they are not a verified robot.
    if (visitorCountry === 'US' && !requestUrlIsUs && !requestIsVerifiedBot) {
      return Response.redirect('https://example.com/us/', 301); // Go back to the US homepage.
    }

    // Continue through.
    return fetch(request);
  }
}

The redirect is working as expected, however, the verified bot condition always appears to fail so crawlers are also being redirected. I'm not sure why this is happening, based on the Cloudflare documentation this should work as expected.

Any help would be appreciated!


Solution

  • You will need to enable Bot Management on your account. This is a paid Cloudflare feature.

    Unfortunately, due to a historical accident, request.cf.botManagement shows up even if you have not subscribed to Bot Management, but in this case it contains dummy values -- the content will be the same regardless of the request. We (Cloudflare) would like to remove the property entirely when you don't have a subscription, but some Worker scripts in the wild are accidentally depending on this field existing even though they haven't subscribed to the feature, so getting rid of it is complicated.