I'm trying to increase performance of indexing my lucene files. For this, I created a worker "LuceneWorker" that does the job.
Given the code below, the 'concurrent' execution becomes significantly slow. I think I know why - it's because the futures grows to a limit that there's hardly memory to perform yet another task of the LuceneWorker.
Q: is there a way to limit the amount of 'workers' that goes into the executor? In other words if there are 'n' futures - do not continue and allow the documents to be indexed first?
My intuitive approach is that I should build a consumer/producer with ArrayBlockingQueue. But wonder if I'm right before I redesign it.
ExecutorService executor = Executors.newFixedThreadPool(cores);
List<Future<List<Document>>> futures = new ArrayList<Future<List<Document>>>(3);
for (File file : files)
{
if (isFileIndexingOK(file))
{
System.out.println(file.getName());
Future<List<Document>> future = executor.submit(new LuceneWorker(file, indexSearcher));
futures.add(future);
}
else
{
System.out.println("NOT A VALID FILE FOR INDEXING: "+file.getName());
continue;
}
}
int index=0;
for (Future<List<Document>> future : futures)
{
try{
List<Document> docs = future.get();
for(Document doc : docs)
writer.addDocument(doc);
}catch(Exception exp)
{
//exp code comes here.
}
}
If you want to limit the number of waiting jobs, use a ThreadPoolExecutor
with a bounded queue like ArrayBlockingQueue
. Also roll your own RejectedExecutionHandler
so that the submitting thread waits for capacity in the queue. You cannot use the convenience methods in Executors
for that as newFixedThreadPool
uses an unbounded LinkedBlockingQueue
.