Search code examples
snowflake-cloud-data-platformwarehouse

The principle of the effect of snowflake warehouse size


I'm experimenting with snowflake. I would like to know how it works as an architecture. I'm using three types of queries A, B, C with different numbers of bytes being scanned. The size of Bytes Scanned is reading Total Statistics in Profile Overview. I compared the Execution times in warehouses of size Small and size Large. When the scan size of query is small, the effect of warehouse size is small, The larger the scan size, the more 4 times the difference in warehouse size (Small 2, Large 8). Performance approaches.

I would like to know how the principle of this result.

| | | Total Excution Time | |

Query ByteScanned(MB) Large(ms) Small(ms) S/L ratio
QueryA 1860 1350 2800 2.1
QueryB 6100 3800 12500 3.3
QueryC 51940 19310 77000 4.0

Solution

  • It's related to how Snowflake (or any Analytical MPP database) can process the data in parallel. As you know, a warehouse cluster may have multiple nodes, a small warehouse has 2, a large warehouse has 8 and so on... Each node has 8 instances to process the data in parallel. I will call them "workers" for clarity.

    So a small warehouse has 28=16 workers, a large warehouse has 88=64 workers. If the data size is small (the table has fewer micro-partitions than the total workers), then your query will not use the whole warehouse resources. Some workers will stay idle during some execution steps such as reading from the table. This is why you get more performance from the large warehouse when you process larger data.