Search code examples
search-engineweb-crawlerdepth-first-searchbreadth-first-search

BFS or DFS for a web crawler?


I was tasked with creating a simple web crawler for a search engine. Now, how should the crawler exactly map the net? Follow the first link he finds and never go back, or some more advanced search methods like BFS or DFS?


Solution

  • I do observe that I am a bit late in responding to the question, but nevertheless, its an interesting discussion.

    BFS seems to be a good strategy, as it can help *avoid continuous requests to a single host*, to an extent. Depends upon your domain as well. You'll still have to handle handling of the server timeouts, but DFS definitely would do some harm. Again, in DFS, you can have cyclic references, running in an infinite loop; unless you make some explicit arrangements.

    There can be other more suitable choices, but between DFS and BFS, BFS wins in my opinion.