Search code examples
javaperformancehibernateout-of-memorydata-processing

Hibernate out of memory exception while processing large collection of elements


I am trying to process collection of heavy weight elements (images). Size of collection varies between 8000 - 50000 entries. But for some reason after processing 1800-1900 entries my program falls with java.lang.OutOfMemoryError: Java heap space.

In my understanding each time when I call session.getTransaction().commit() program should free heap memory, but looks like it never happens. What do I do wrong? Here is the code:

private static void loadImages( LoadStrategy loadStrategy ) throws IOException {
    log.info( "Loading images for: " + loadStrategy.getPageType() );

    Session session = sessionFactory.openSession();
    session.setFlushMode( FlushMode.COMMIT );
    Query query = session.createQuery( "from PageRaw where pageType = :pageType and pageStatus = :pageStatus and sessionId = 1" );
    query.setString( "pageStatus", PageStatus.SUCCESS.name() );
    query.setString( "pageType", loadStrategy.getPageType().name() );
    query.setMaxResults( 50 );

    List<PageRaw> pages;
    int resultNum = 0;

    do {

        session.getTransaction().begin();

        log.info( "Get pages statring form " + resultNum + " position" );
        query.setFirstResult( resultNum );
        resultNum += 50;
        pages = query.list();
        log.info( "Found " + pages.size() + " pages" );


        for (PageRaw pr : pages ) {
            Set<String> imageUrls = new HashSet<>();
            for ( UrlLocator imageUrlLocator : loadStrategy.getImageUrlLocators() ) {
                imageUrls.addAll(
                        imageUrlLocator.locateUrls( StringConvector.toString( pr.getSourceHtml() ) )
                );
            }

            removeDeletedImageRaws( pr.getImages(), imageUrls );
            loadNewImageRaws( pr.getImages(), imageUrls );
        }

        session.getTransaction().commit();

    } while ( pages.size() > 0 );

    session.close();
}

Solution

  • It's important to distinguish these two actions:

    • flushing a session executes all pending statements against the database (it synchronizes the in-memory state with the database state);

    • clearing a session purges the session (1st-level) cache, thus freeing memory.

    So you need to both flush and clear a session in order to recover the occupied memory.

    In addition to that, you must disable the 2nd-level cache. Otherwise all (or most of) the objects will remain reachable even after clearing the session.