Search code examples
sqlperformanceindexingsybasesap-ase

Sybase ASE: Optimizing a query with multiple subqueries


I have a large table (large_table, 6M+ rows) with indexed columns

  • id primary key
  • small_ref_id (integer) references the primary key of a small table (small_table, 2k+ rows)
  • mid_ref_id (integer) references the primary key of a medium table (mid_table, 200k+ rows)

  • The data of small_table is divided by groups of data that are named in the indexed column group_name (varchar).

  • The data of mid_table is divided by types of data that are named in the indexed column type (varchar).

  • The large table has 2 indexed timestamps (date_time1,date_time2).

  • The medium table has 1 indexed timestamp (date_time3).

My query looks like :

select * from large_table, mid_table, small_table 
where large_table.small_ref_id=small_table.id
and large_table.mid_ref_id=mid_table.id
and small_table.group_name='MyGroup'
and 
   (large_table.id in (select large_table.id from large_table, mid_table
where mid_table.id=large_table.mid_ref_id 
and large_table.date_time1 between '2010-01-01' and '2017-01-01' 
and mid_table.type='Type1')
or large_table.id in (select large_table.id from large_table, mid_table 
where mid_table.id=large_table.mid_ref_id 
and large_table.date_time2 between '2010-06-01' and '2017-01-01' 
and mid_table.type='Type2')
or large_table.id in (select large_table.id from large_table, mid_table 
where mid_table.id=large_table.mid_ref_id 
and mid_table.date_time3 between '2010-08-01' and '2017-01-01' 
and mid_table.type='Type3'))

It can take several minutes (<5min) to fetch the results.

What I tried :

  • Spliting the query by small_ref_id into as many queries as ids inside 'MyGroup', running in parallel (using a thread pool inside the application, with a fixed number of workers) : This resulted in no improvement, plus taking 100% of database CPU.
  • Replacing the "large_table.id in (select id from large_table" with "exists (select 1 from large_table t where t.id=large_table.id" : This also resulted into no improvement.

Any ideas ?


Solution

  • It seems to me you don't need subqueries. You can just apply the three conditions to the large_table and mid_table data you already have from the main query:

    select     * 
    from       large_table 
    inner join mid_table
            on large_table.mid_ref_id = mid_table.id
    inner join small_table 
            on large_table.small_ref_id = small_table.id
           and small_table.group_name = 'MyGroup'
    where      (
                    (large_table.date_time1 between '2010-01-01' and '2017-01-01' 
                     and mid_table.type = 'Type1')
                or
                    (large_table.date_time2 between '2010-06-01' and '2017-01-01' 
                     and mid_table.type = 'Type2')
                or  
                    (mid_table.date_time3 between '2010-08-01' and '2017-01-01' 
                     and mid_table.type = 'Type3')
               )
    

    As a side note: use the inner join syntax.

    NB: Are you sure about the last condition? It tests on mid_table.date_time3 while the other two conditions take their date from large_table...