Search code examples
sqlamazon-redshiftquery-performancesql-view

How to performance test Redshift Views?


I have a question about testing query performance for views in redshift.

I have two tables: table_a and table_b: - table a and table b have different sort key defined. - table a has 6 fields for sort key. - table b has 4 fields for sort key. - both tables share some column names/types but table a is a super set of table b.

I created a view v_combined. The view combines data from table a and table b based on dates queried. For example if query is made before date XYZ, the view will source table a. Otherwise it sources table b.

create view as v_combined
select a as x, b as y, c as z, to_timestamp(time_field::TEXT, 'YYYYMMDD'):timestamp as date
from table_a
where date < "MY_DATE"

union all

select * from table_b
where date > "MY_DATE"

I did a comparison between the view and corresponding table:

  1. select count(*) from v_combined where date < "MY_DATE"
  2. select count(*) from table_a where date < "MY_DATE"
  3. select count(*) from v_combined where date > "MY_DATE"
  4. select count(*) from table_b where date > "MY_DATE"
  5. select * from v_combined where date < "MY_DATE" limit 10000
  6. select * from table_a where date < "MY_DATE" limit 10000
  7. select * from v_combined where date > "MY_DATE" limit 10000
  8. select * from table_b where date > "MY_DATE" limit 10000

(1) and (2) have similar execution time as expected.

(3) and (4) have similar execution time as expected.

(5) seems to have longer execution time than (6).

(7) seems to have longer execution time than (8).

What is the best way to test the performance of a view in redshift?


Solution

  • I'd say that the best way to test the performance of a view is to run test queries exactly like you did!

    The performance of this particular View will always be poor because it is doing a UNION ALL.

    In (5), it needs to get ALL rows from both tables before applying the LIMIT, whereas (6) only needs to access table_a and can stop as soon as it hits the limit.

    If you need good performance from queries like this, you could consider creating a combined table (rather than a view). Run a daily (or hourly?) script to recreate the table from the combined data. That way, queries will run much faster.