I'm using Searcharoo.NET to crawl a website "testsite.com/en" with a specific language. There is a "testsite.com/fr", "testsite.com/us" and so on. Later I want to index the pages on each site so can be available for searching but I want the different languages to be separated.
The problem is that when searcharoo starts crawling at testsite.com/en it also indexes pages from the other languages such as testsite.com/fr. Is there a way to prevent this from happening? I thought that I could restrict the crawler to only search forward or say stop on certain pages but have not found any documentation on the subject from searcharoo.
Much appreciated, thanks!
Please look at the following blog a guy wrote:
http://draganbl.blogspot.com/2011/04/how-do-you-use-searcharoo-library-to.html
It does not seem as if you can do as you want, but maybe setup a "crawler/spider" for each individual language. My answer seems pretty vague but maybe it can send you in a direction.