Search code examples
javaoracleperformance

Why the oracle query is slow in java but much faster when it is just directly executed in sqlDeveloper?


I have around 100k records in the db table and when I execute this query in oracle sqlDeveloper, it takes around 33 seconds to fetch the 100k records.

select wflworkflo1_.REF_ID from ( select REF_ID, START_DATE from OS_HISTORYSTEP WHERE START_DATE < to_date('20/09/2023' , 'dd/mm/yyyy') UNION ALL select REF_ID, START_DATE from OS_CURRENTSTEP WHERE START_DATE < to_date('20/09/2023' , 'dd/mm/yyyy')) wflworkflo1_ WHERE REF_ID NOT IN (SELECT DISTINCT REF_ID FROM (select REF_ID, START_DATE from OS_HISTORYSTEP WHERE START_DATE >= to_date('20/09/2023' , 'dd/mm/yyyy') UNION ALL select REF_ID, START_DATE from OS_CURRENTSTEP WHERE START_DATE >= to_date('20/09/2023' , 'dd/mm/yyyy')) INT_WFL GROUP BY REF_ID) group by wflworkflo1_.REF_ID having max(wflworkflo1_.START_DATE) < to_date('20/09/2023' , 'dd/mm/yyyy');

In my java program, this is the method to run the query.

public List<String> findCurrHistStepRefIdWithStartDate(String targetStartDate)
        throws BocException {
    
    StringBuffer sql = new StringBuffer();
    sql.append(
            "select wflworkflo1_.REF_ID from ( select REF_ID, START_DATE from OS_HISTORYSTEP WHERE START_DATE < to_date(?, 'dd/mm/yyyy') UNION ALL select REF_ID, START_DATE from OS_CURRENTSTEP WHERE START_DATE < to_date(?, 'dd/mm/yyyy')) wflworkflo1_ WHERE REF_ID NOT IN (SELECT DISTINCT REF_ID FROM (select REF_ID, START_DATE from OS_HISTORYSTEP WHERE START_DATE >= to_date(?, 'dd/mm/yyyy') UNION ALL select REF_ID, START_DATE from OS_CURRENTSTEP WHERE START_DATE >= to_date(?, 'dd/mm/yyyy')) INT_WFL GROUP BY REF_ID) group by wflworkflo1_.REF_ID having max(wflworkflo1_.START_DATE) < to_date(?, 'dd/mm/yyyy')");
    
    JdbcTemplate template = getJdbcTemplate();
    
    return template.queryForList(sql.toString(), String.class,
            targetStartDate, targetStartDate, targetStartDate, targetStartDate,
            targetStartDate);

}

This Java function takes more than 1 hour to return the result. Why is there such a huge difference?


Solution

  • Rewriting the joins to analytic functions should avoid most potential database problems. Since the rewritten query has no joins, there are very few ways the execution plan can go wrong. As an added bonus, the new code is smaller and arguably simpler.

    -- ref_ids that only have old data.
    select distinct ref_id
    from
    (
        --all data along with counts of new dates per ref_id.
        select ref_id, start_date,
            sum(case when start_date >= to_date('20/09/2023' , 'dd/mm/yyyy') then 1 else 0 end) 
            over (partition by ref_id) new_dates_per_ref_id
        from
        (
            --all data
            select ref_id, start_date from os_historystep
            union all
            select ref_id, start_date from os_currentstep
        )
    )
    where start_date < to_date('20/09/2023' , 'dd/mm/yyyy')
        and new_dates_per_ref_id = 0;
    

    However, this answer is merely a workaround. You may want to dig into the execution plans and find out why this problem is happening. To do that, I recommend you follow astentx's advice and use dbms_sql_monitor.report_sql_monitor. But solving difficult Oracle performance problems can take a long time, so there's nothing wrong with just avoiding the problem if you don't have hours to spend on query tuning.