Search code examples
web-crawlerrobots.txtduckduckgo

Block a site from search engine - DuckDuckGo


I have a development site https://text-domain.example. When I go to https://duckduckgo.com and search for text-domain.example, it does return results.

What have I tried so far:

Created robots.txt file with following code(put in in my root directory i.e in text-domain.example/robots.txt):

User-agent: *
Disallow: /

Then added meta-tag like this in my template file:

<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">

Even after doing this, I searched on DuckDuckGo and it yielded the same result. Any suggestions would be welcome.

PS.

after waiting for few days there are 2 findings:

  • Still, the search results are fetched.
  • But I see an message for that result saying: "We would like to show you a description here but the site won't allow us."

Is it possible to completely block from showing in the results?


Solution

  • DuckDuckGo should honour your robots.txt. Their bot DuckDuckBot is documented at https://duckduckgo.com/duckduckbot.

    But note: the DuckDuckGo bot isn’t crawling everything itself (as DuckDuckGo gets results from other sources), so your pages might still show up if you don’t block the bots of these other sources (like Bing). Refer to mlissner’s answer for more details.

    With robots.txt, there are two things to consider:

    • It takes time until changes in your robots.txt are recognized. You have to wait until the relevant bot visits your site again.
    • Even if your URLs are blocked in the robots.txt, search engines may still list your URLs in their search results (without crawled metadata like title and description).

    Using the robots-meta element with noindex would prevent even listing the URLs in search engines like Google, but DDG doesn’t seem to support it.

    Note that you used wrong quotation marks in your example. It should be

    <META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
    

    instead of

    <META NAME=”ROBOTS” CONTENT=”NOINDEX, NOFOLLOW”>