Search code examples
javamysqllarge-filesheap-size

java Heap Size Issue when result set returns large number of data


I have a java-spring-hibernate web application which call a my sql stored procedure which call user specific "event" information.

Requirement is to write the result set into a text file and send it to one ftp server .I have an allocated space of 4gb as heap size.I cant allocate more heap size than this. But heap size issue is coming when stored proc return large result set and processing it from server side

Current code :

1:Stored proc return large result set( say some million row)

2:mapping it to a list

3:write each row into a text file .For a single user write only 3 events .If he has more than 3 events ignore it(Please notice I am getting email based sorted result set from mysql). This process I need to do in server side

4:heap size issue came when result set size is large

Pseudo code:

 Stored proc
    delete temp table if exist
     get public event  into temp1 table based on condition
     get non public event into temp2 table
     create new temp tabletemp 3 by joining temp1 and temp2 and join with user table with some condition and sort it
    return select * from temp3 sorted based on email

Even after setting all required index it will take normally 20 minutes to finish Server side code

   1:Create File writer object
    2:call stored proc
    import javax.persistence.EntityManager;
    ...
    ..
    Query query = entityManager.createNativeQuery("CALL procName(param)");
    ....
    list = query.getResultList();
    ....
    for (final Object[] objects : evntList) {
                    final EvntDTO dto = new ExactTargetDailyMailDTO(
                            String.valueOf(objects[0]),
                            String.valueOf(objects[1]),....,....,);
    if dto.getEvntNAme() matches some condition(){
    //change event name and other params accordingly
    }
    eventList.add(dto);
    }
    ...
    for (int i = 0; i < eventList.size(); i++) {
    Dto dto = eventList.get(i);
    count=1;
    ...
    fileContent=dto.getEmail()+appendmore user info +dto.getEventInfo();

    while (i + 1 < eventList.size()
                            && dto.getEmail().equalsIgnoreCase(remindersList.get(i + 1).getEmail())) {
                    if (count < 3) {
                    ....
                   fileContent=fileContent+add next EventInfo 

                   }
    ...
    ..
                   bw.write(fileContent);
                    bw.newLine();

    }
bw.close();
fw.close();

Can anyone suggest me a better plan to do this?

Since I want to do some more operation in this list(like If he has more than 3 events ignore it),I cant think of taking 1000 or 100 000 data at a time and write it into file and repeat the process

Can anyone suggest me a better way to architect and do it?


Solution

  • Skip step 2: storing it into list.

    That way you force all of your streamed ResultSet into memory.

    If you opened File before sql and then wrote a line for every rs.next () you'd only have some rows in memory at a time.

    If more than 3 events skip after delete.

    or

    Rewrite your sql to return only 3 rows (use limit 3) - that way no big resultset to handle

    Some thoughts on your code

    This is the part where you receive your objects into the list:

    for (final Object[] objects : evntList) {
                    final EvntDTO dto = new ExactTargetDailyMailDTO(
                            String.valueOf(objects[0]),
                            String.valueOf(objects[1]),....,....,);
    
      if(dto.getEvntName() matches some condition(){
      //change event name and other params accordingly
      }
      eventList.add(dto);
    

    And later on you iterate over all events in that list in sequential order:

    for (int i = 0; i < eventList.size(); i++) {
    

    I don't see why it shouldn't be possible to do the stuff you do in the for loop instead of adding it to the list.

    The part where you check if the next entry has the same email as the last one - you should be able to reverse that into looking if the current item has the same value as the previous one. That way you don't have to "peek ahead" in your list.

    You also seem to be doing some checking based on those email addresses. If you'd know your values in remindersList beforehand, you should be able to use that as a condition to your query (AND email LIKE (?, ?, ?...) reducing the number of rows returned?