Map query segment and locus from SVL_COMPILE to actual query text

When debugging queries in Redshift, it's useful to look at SVL_COMPILE to see where queries are triggering a compile, which obviously gobbles up quite a lot of time if you're doing it by accident. However, all you get in SVL_COMPILE is some idea of 'segment' and 'locus'.

Is there any way to map these numbers to actual parts of a query's text? Or at least, how can one reason about what they might refer to? The segments don't seem to match the number of nodes in an EXPLAIN of the query, for example.

Solution

From the documentation for EXPLAIN, looks like in the returned result of EXPLAIN query

XN Merge  (cost=1002815366604.92..1002815366606.36 rows=576 width=27)
  Merge Key: sum(sales.pricepaid)
  ->  XN Network  (cost=1002815366604.92..1002815366606.36 rows=576 width=27)
        Send to leader
        ->  XN Sort  (cost=1002815366604.92..1002815366606.36 rows=576 width=27)
              Sort Key: sum(sales.pricepaid)
              ->  XN HashAggregate  (cost=2815366577.07..2815366578.51 rows=576 width=27)
                    ->  XN Hash Join DS_BCAST_INNER  (cost=109.98..2815365714.80 rows=172456 width=27)
                          Hash Cond: ("outer".eventid = "inner".eventid)
                          ->  XN Seq Scan on sales  (cost=0.00..1724.56 rows=172456 width=14)
                          ->  XN Hash  (cost=87.98..87.98 rows=8798 width=21)
                                ->  XN Seq Scan on event  (cost=0.00..87.98 rows=8798 width=21)

every line with the -> XN prefix represents a single step of EXPLAIN operation. Most steps provide context in addition to their operator such as XN Sort operator provides the sort key so you can map it to the order by part of your query. However, some of the steps might not be linked to any part of the query such as XN Network where it only send the result to leader node for further processing. You can learn more about the EXPLAIN operator here.

A single segment returned from SVL_COMPILE can include multiple of these EXPLAIN operations mentioned in the output. How operations are divided between segments is indeterministic due to its parallel nature, hence you can only know it after the query is executed. After the query is executed, its result are logged in SVL_QUERY_REPORT and SVL_QUERY_SUMMARY. You can print out the performance report of the query using query

select * from svl_query_report where query = <your query ID> order by segment, step, elapsed_time, rows;

select * from svl_query_summary where query = <your query ID> order by stm, seg, step;

This will output a table with the query's information. The information we need is segment id (seg or segment column), step id (step column) and step label (label column). According to documentation about, EXPLAIN output steps are executed from the bottom up, meaning the segment id and step id is also indexed from the bottom up in 0-index. Also also note step id index is contained within a single segment, so when moving to a new segment, the step id is reset to 0. For example, given the above sample EXPLAIN output and the diagram in documentation for EXPLAIN , the indexing for each XN operations would look something like below

segment=5;step=0 XN Merge  (cost=1002815366604.92..1002815366606.36 rows=576 width=27)
  Merge Key: sum(sales.pricepaid)
segment=4;step=0  ->  XN Network  (cost=1002815366604.92..1002815366606.36 rows=576 width=27)
        Send to leader
segment=3;step=1        ->  XN Sort  (cost=1002815366604.92..1002815366606.36 rows=576 width=27)
                               Sort Key: sum(sales.pricepaid)
segment=2;step=2|segment=3;step=0              ->  XN HashAggregate  (cost=2815366577.07..2815366578.51 rows=576 width=27)
segment=2;step=2                    ->  XN Hash Join DS_BCAST_INNER  (cost=109.98..2815365714.80 rows=172456 width=27)
segment=2;step=1                          Hash Cond: ("outer".eventid = "inner".eventid)
segment=2;step=0                          ->  XN Seq Scan on sales  (cost=0.00..1724.56 rows=172456 width=14)
segment=1;step=0                          ->  XN Hash  (cost=87.98..87.98 rows=8798 width=21)
segment=0;step=0                                ->  XN Seq Scan on event  (cost=0.00..87.98 rows=8798 width=21)

To get more context on the step id actual EXPLAIN operations, you can read its label in the label column. Each label column can be mapped to a specific operation. You can check here for the full map table

By tracing back each segment to each EXPLAIN steps, you can then read their performance on the table SVL_QUERY_REPORT and SVL_QUERY_SUMMARY. You can check here and here for more details

Regarding why the number of segments doesn't match the number of nodes in an EXPLAIN, from the overview documentation, segment is the smallest compilation unit executable by a compute node. It means a segment can only be processed by one node but one node can process multiple segments, hence the difference in number of nodes and segments.