Search code examples
asp.netsolrweb-crawlernutchsolrnet

Custom Parser for Nutch (or open source .NET Crawler)


I have been using Nutch/Solr/SolrNet for my search solutions, I must say, it works a treat. On a new site I'm working on, I am using Master pages, as a result, content in the header and footer is getting indexed and distorts the results. For example, I have a link to the Contact Us page in the header. Now, when I search for 'Contact' the result returns all the pages in the site.

Is there a customizable Nutch parser that i can maybe pass a div id and then it only indexes content inside the div.

Or if there are .NET based crawlers that I can customize.


Solution

  • See https://issues.apache.org/jira/browse/NUTCH-585 and https://issues.apache.org/jira/browse/NUTCH-961

    BTW you'd get a more relevant audience by posting to the Nutch user list