Search code examples
javaperformancewhile-looplucenelarge-data

Lucene Document creation in while loop slows down more and more


I have some efficiency problems. I'm developing an enterprise application that is deployed on a jboss EAP 6.1 server as an EAR archive. I create new objects based on entities in a while loop and write them to a file. I get those entities (with help of an EJB DAO) in limited amount (for example 2000 for each step). The problem is that I need to process millions of objects and the first million goes quite smoothly but the further loop goes the slower it works. Can anyone tell me why is this working slower and slower as the loop advances? How can I make it work smoothly all way long? Here are some crucial parts of the code:

    public void createFullIndex(int stepSize) {
       int logsNumber = systemLogDao.getSystemLogsNumber();
       int counter = 0;
       while (counter < logsNumber) {
           for (SystemLogEntity systemLogEntity : systemLogDao.getLimitedSystemLogs(counter, stepSize)) {
               addDocument(systemLogEntity);
           }
           counter = counter + stepSize;
       }
       commitIndex();
    }

    public void addDocument(SystemLogEntity systemLogEntity) {
       try {
        Document document = new Document();
        document.add(new NumericField("id", Field.Store.YES, true).setIntValue(systemLogEntity.getId()));
        document.add(new Field("resource", (systemLogEntity.getResource() == null ? "" : systemLogEntity
                .getResource().getResourceCode()), Field.Store.YES, Field.Index.ANALYZED));
        document.add(new Field("operationType", (systemLogEntity.getOperationType() == null ? "" : systemLogEntity
        document.add(new Field("comment",
                (systemLogEntity.getComment() == null ? "" : systemLogEntity.getComment()), Field.Store.YES,
                Field.Index.ANALYZED));
        indexWriter.addDocument(document);
       } catch (CorruptIndexException e) {
           LOGGER.error("Failed to add the following log to Lucene index:\n" + systemLogEntity.toString(), e);
       } catch (IOException e) {
           LOGGER.error("Failed to add the following log to Lucene index:\n" + systemLogEntity.toString(), e);
       }
    }

I would appreciate your help!


Solution

  • As far as I can see you do not write your stuff to file as far as you get it. Instead you try to create full DOM object and then flush it to file. This strategy is good for limited amount of objects. In your case when you have to deal with millions of them (as you said) you should not use DOM. Instead you should be able to create your XML fragments and write them to file while you are receiving the data. This will reduce your memory consumption and hopefully improve the performance.