Search code examples
webseorobots.txtnoindex

robots.txt exclude paths with language


If i for example want to disallow the following path on my page: http://www.examplepage.com/en/testing

But not only the /en/ version, also /da/, /de/ etc. (you get my point).

How would i do that the smartest way? Without writing a disallow for the same pages in every language.

I tried: Disallow: /*/testing and Disallow: /*testing

But i found out that it also disallows other pages like: http://www.examplepage.com/en/news-page/testing etc.

It should only disallow the path i specified in the beginning of my post.


Solution

  • Using placeholders/RegExp in robots.txt is very limited and not reliable. See this question for more information: Regexp for robots.txt

    It's probably to best solution to list all languages separately or to write a script which outputs a list with all rules.