Search code examples
azure-data-lakeu-sql

Does Clustered Index in U-SQL table impact parallelism?


We are working with U-SQL tables and have questions related to Clustered Index. In U-SQL table, parallelism is managed by how data is partitioned and distributed. Does Clustered Index impact parallelism as well in U-SQL table? Secondly how it manages data skew in a distribution bucket?


Solution

  • Clustered index is not impacting parallelism per se, but it may impact if you read the data using an index seek or index scan depending on the query predicate. So it impacts the performance of accessing the data inside a vertex.

    Data skew is not managed. If you have skew you will have to either find a better distribution key, use a skewfactor hint or use ROUND ROBIN distribution.