I have a website data, where I track the number of pages that are viewed and a video content that's completely viewed. I have no issues with pulling the data. However, I need to put this into a dashboard and trying to transform the data so that there's a column that would track the content name, other columns to track Viewed and Completed.
Like this:
The Original query to pull the data takes around 3 minutes. But when I updated it to get the transformation:
select distinct site_name, date, page_name, 'View' as content-type, content1_View as content_name, launch_count
from report_qry
UNION
select distinct site_name, date, page_name, 'Complete' as content-type, content1_complete as content_name, launch_count
from report_qry
After adding the unions, the query is now taking 15 Minutes to complete. Is there any workaround or a better way to do this?
The original query gets data from around 10 million records also there's a good chance that the number of contents will increase. Now I just have 2.
If I read your description correctly, the report_qry is a CTE and it took 3 mins to complete.
Your query can be optimized by the following steps: Step 1. UNION -> UNION ALL
UNION means UNION DISTINCT, your subqueries guarante content-types are different between them ('View' VS 'Complete'). So we don't need UNION DISTINCT here. Write it explicitly as UNION ALL will save the distinct step when having union operation on subqueries.
select distinct site_name, date, page_name, 'View' as content-type, content1_View as content_name, launch_count
from report_qry
UNION ALL
select distinct site_name, date, page_name, 'Complete' as content-type, content1_complete as content_name, launch_count
from report_qry
Step 2. Avoid computing report_qry twice by using UNNEST()
report_qry is a CTE, it appears twice in the query and are computed twice. we can rewrite the query by using UNNEST() to avoid that
select distinct
site_name,
date,
page_name,
content-type,
case when content-type = 'View' then content1_View
when content-type = 'Complete' then content1_complete
end as content_name,
launch_count
from report_qry
cross join unnest(ARRAY['View','Complete']) as t(content-type)
Step 3 (Optional). Verify the data to see whether select distinct is necessary. If it is not necessary, remove distinct operator.