The company I work for have millions of documents that are stored and shared on multiple network drives mapped to users' drives (e.g.] d:\ to \server1\, etc).
What I'd like to implement is to crawl over network drives and let users find files fast using a full-text indexing.
My current indexing strategy is Lucene.net
But I am not sure how often I should be indexing network drives because there are millions of documents to index and not to mention packets traveling over network.
So the question is how should I implement indexing frequency?
I've been doing researches on how often Google/Windows Desktop searches index as an example but been fruitless.
A lot of the answer is wrapped up in whatever service level agreements you have with your customers. If your SLA states that search results are current within X number of minutes, than that answers your question on how you should implement indexing frequency.
If you, like me, do not have concrete SLA's in place for searching and indexing, then you can be more flexible. For example, I manage, among other things, a SharePoint Search server for my business. In addition to our web site, we also index a lot of content in unstructured file space. The server supports full and incremental crawls. We timed several incremental crawls to get an estimate of how long it takes to complete an incremental crawl. We then scheduled our incremental crawls on an interval comfortably larger than the observed elapsed time. We scheduled full crawls to occur less frequently at non-peak times.
The specifics may vary depending on the specific indexing technology you use, but the principle is the same:
Good luck!