Search code examples
amazon-web-servicesdatabase-performanceamazon-redshift

Amazon Redshift: Finding and fixing a skewed DISTKEY


From my Redshift cluster performance panel, I can see that one of the nodes has roughly twice as much data as the others, and that leads to a significantly higher CPU utilization too. There are a few dozen large tables in the database using a key-based distribution, and I haven't been able to find which aren't properly balanced.

Searching in the documentation, I saw that the SVV_TABLE_INFO view has a column called skew_rows. Is it the number I'm looking for?


Solution

  • I think the "pct_skew_across_slices" in this article is what you're looking for.

    http://docs.aws.amazon.com/redshift/latest/dg/c_analyzing-table-design.html