Search code examples
springpostgresqlhibernatejpabatch-insert

Batch-Insert into Postgres DB using Hibernate/Spring Data: 60K Rows Takes 2 Minutes which is unacceptable


I need to insert 60K rows into a Postgres DB in my Java/Spring application, using Hibernate/Spring Data.

The INSERT data that goes in is (1) USERS_T, (2) the associated new Users must also be in STUDY_PARTICIPANTS_T. Both of these are for 60K records each.

The below is working, but the performance is poor: 60K takes 2 minutes. Note that I'm filling out the Hibernate entity and then doing saveAll based on lists of size 1000.

        UsersT user = new UsersT();
        user.setUsername(study.getAbbreviation().toUpperCase()+subjectId);
        user.setRoleTypeId(new LookupT(150));
        user.setCreatedDate(new Date());
        //...
        List<StudyParticipantsT> participants = new ArrayList<StudyParticipantsT>();
        StudyParticipantsT sp = new StudyParticipantsT();
        sp.setStudyT(study);
        sp.setUsersT(user);
        sp.setSubjectId(subjectId);
        sp.setLocked("N");
        participants.add(sp);
        user.setStudyParticipantsTs(participants);

        // Add to Batch-Insert List; if list size ready for batch-insert, or if at the end of all subjectIds, do Batch-Insert saveAll() and clear the list
        batchInsertUsers.add(user);
        if (batchInsertUsers.size() == 1000 || i == subjectIds.size() - 1) {
            // Log this Batch-Insert
            if(log.isDebugEnabled()){
                log.debug("createParticipantsAccounts() Batch-Insert: Saving " + batchInsertUsers.size() + " records");
            }
            userDAO.saveAll(batchInsertUsers);
            // Reset list
            batchInsertUsers.clear();
        }          

I found a thread where someone was having the same problem, and the only solution they found is to compose a custom Native-SQL INSERT (..), (..), (..) string for each chunk of 1000, and run that manually, cutting out the ORM/Hibernate layer entirely: Need to insert 100000 rows in mysql using hibernate in under 5 seconds

But my INSERTs involve some joined tables. I could take the time to rewrite all these entity statements into a custom SQL myself, but it wouldn't be straightforward.

Are there any other solutions? I'm using - Spring 5.0.2 - Hibernate5.2.12


Solution

  • We improved performance by using SpringJDBC's jdbcTemplate.batchUpdate (no Hibernate) and reserving a Sequence range in advance for any foreign keys.

    We didn't get down to the level of actual N repeated INSERT statements, which the other poster referenced above did; we're still using a framework approach (JDBCTemplate), but at least we don't use Hibernate/ORM anymore. This approach is fast, but it's not as super-fast as the N repeated INSERTs -- but it's acceptable now.

    The actual SpringJDBC Batch-Insert occurs via jdbcTemplate.batchUpdate(sqlInsert, new BatchPreparedStatementSetter() {..}, and we actually split up the batches ourselves -- the BatchPreparedStatementSetter won't automatically split anything up for us, it will just submit that particular batch with a predetermined size.

    /**
     * The following method performs a Native-SQL Batch-Insert of Participant accounts (using JdbcTemplate) to improve performance.
     * Each Participant account requires 2 INSERTs: (1) USERS_T, (2) STUDY_PARTICIPANTS_T (with an FK reference to USERS_T.ID).
     * Since there is no easy way to track the Sequence IDs between the two Batch-Inserts, we reserve the ID range for both tables, and 
     * then manually calculate our own IDs for USERS_T and STUDY_PARTICIPANTS_T ourselves.
     * Initially, domain objects are filled out; then they are added to the Batch List that we submit and clear ourselves.
     * (Originally the Batch-Insert was implemented with Hibernate/HQL, but due to slow performance it was nativized with jdbcTemplate.)
     * 
     * NOTE: The entire method is @Transactional and all data will be rolled back in case of any exceptions in this method (rollbackFor=Exception.class).
     * The updated Sequence values (set during reservation) will not be rolled back in this case, but Sequence gaps are normal. 
     */
    @Override
    @Transactional(readOnly = false, rollbackFor = Exception.class)
    public void createParticipantsAccounts(long studyId, List<String> subjectIds) throws Exception {
    
        int maxInsertParticipantsBatchSize = 1000; // Batch size is 1000
        
        /*
          We need to insert into 2 tables, USERS_T and STUDY_PARTICIPANTS_T. 
          The table STUDY_PARTICIPANTS_T has an FK dependency on USERS_T.ID.
          Since there is no easy way to track the Sequence IDs between the two Batch-Inserts, we reserve the ID range for both tables, 
          and then manually calculate our own IDs for USERS_T and STUDY_PARTICIPANTS_T ourselves.
          The Sequences are immediately updated to the calculated final values to reserve the range. 
         */
        // 1. Obtain current Sequence values
        Integer currUsersTSeqVal = userDAO.getCurrentUsersTSeqVal();
        Integer currStudyParticipantsTSeqVal = studyParticipantsDAO.getCurrentStudyParticipantsTSeqVal();
        // 2. Immediately update the Sequences to the calculated final value (this reserves the ID range immediately)
        // In Postgres, updating the Sequences is: SELECT setval('users_t_id_seq', :val)
        userDAO.setCurrentUsersTSeqVal(currUsersTSeqVal + subjectIds.size());
        studyParticipantsDAO.setCurrentStudyParticipantsTSeqVal(currStudyParticipantsTSeqVal + subjectIds.size());                          
        
        // List for Batch-Inserts, maintained and submitted by ourselves in accordance with our batch size
        List<UsersT> batchInsertUsers = new ArrayList<UsersT>();        
        
        for(int i = 0; i < subjectIds.size(); i++) {
            
            String subjectId = subjectIds.get(i);           
            
            // Prepare domain object (UsersT with associated StudyParticipantsT) to be used in the Native-SQL jdbcTemplate batchUpdate
            UsersT user = new UsersT();
            user.setId(currUsersTSeqVal + 1 + i); // Set ID to calculated value
            user.setUsername(study.getAbbreviation().toUpperCase()+subjectId);
            user.setActiveFlag(true);
            // etc., fill out object, then subobject:
            List<StudyParticipantsT> participants = new ArrayList<StudyParticipantsT>();
            StudyParticipantsT sp = new StudyParticipantsT();
            sp.setId(currStudyParticipantsTSeqVal + 1 + i); // Set ID to caculated value
            // ...etc.
            user.setStudyParticipantsTs(participants);
            
            // Add to Batch-Insert List of Users
            batchInsertUsers.add(user);
            
            // If list size ready for Batch-Insert, or if at the end of all subjectIds, perform Batch Insert (both tables) and clear list
            if (batchInsertUsers.size() == maxInsertParticipantsBatchSize || i == subjectIds.size() - 1) {
                
                // Part 1: Insert batch into USERS_T
                nativeBatchInsertUsers(jdbcTemplate, batchInsertUsers);             
                // Part 2: Insert batch into STUDY_PARTICIPANTS_T
                nativeBatchInsertStudyParticipants(jdbcTemplate, batchInsertUsers);             
                // Reset list
                batchInsertUsers.clear();
            }
        }
    }
    

    The sub-methods for the actual Batch-Insert:

    /**
     * Native-SQL Batch-Insert into USERS_T for Participant Upload.
     * NOTE: This method is part of its Parent's @Transactional. (Note also that we need "final" on the List param for Inner-Class access to this variable.)
     *  
     * @param jdbcTemplate
     * @param batchInsertUsers
     */
    private void nativeBatchInsertUsers(JdbcTemplate jdbcTemplate, final List<UsersT> batchInsertUsers) {
    
        String sqlInsert =  "INSERT INTO PUBLIC.USERS_T (id, password, user_name, created_by, created_date, last_changed_by, last_changed_date, " + 
                                                        "first_name, last_name, organization, phone, lockout_date, lockout_counter, last_login_date, " + 
                                                        "password_last_changed_date, temporary_password, active_flag, uuid, " + 
                                                        "role_type_id, ws_account_researcher_id) " +
                            "VALUES (?, ?, ?, ?, ?, ?, ?, " +
                                    "?, ?, ?, ?, ?, ?, ?, " + 
                                    "?, ?, ?, ?, " + 
                                    "?, ?" +
                                    ") ";
    
        
        jdbcTemplate.batchUpdate(sqlInsert, new BatchPreparedStatementSetter() {
    
            @Override
            public int getBatchSize() {
                return batchInsertUsers.size();
            }
    
            @Override
            public void setValues(PreparedStatement ps, int i) throws SQLException {
                ps.setInt(1, batchInsertUsers.get(i).getId()); // ID (provided by ourselves)
                // etc., set PS for each i-th object
    
            }       
            
        });
        
    }
    
    /**
     * Native-SQL Batch-Insert into STUDY_PARTICIPANTS_T for Participant Upload.
     * NOTE: This method is part of its Parent's @Transactional. (Note also that we need "final" on the List param for Inner-Class access to this variable.)
     *  
     * @param jdbcTemplate
     * @param batchInsertUsers
     */ 
    private void nativeBatchInsertStudyParticipants(JdbcTemplate jdbcTemplate, final List<UsersT> batchInsertUsers) {
        
        String sqlInsert =  "INSERT INTO PUBLIC.STUDY_PARTICIPANTS_T (id, study_id, subject_id, user_id, locked, " +                                                                     "created_by, created_date, last_changed_by, last_changed_date) " + 
                            "VALUES (?, ?, ?, ?, ?, " +
                                    "?, ?, ?, ? " +
                                    ") ";
                
        jdbcTemplate.batchUpdate(sqlInsert, new BatchPreparedStatementSetter() {
    
            @Override
            public int getBatchSize() {
                return batchInsertUsers.size();
            }   
            
            @Override
            public void setValues(PreparedStatement ps, int i) throws SQLException {            
                
                ps.setInt(1, batchInsertUsers.get(i).getStudyParticipantsTs().get(0).getId()); // ID (provided by ourselves)
                // etc. 
            }
            
        });
        
    }