Search code examples
azure-synapsedelta-lake

Why would someone use a Delta Lake over a dedicated SQL Pool?


From what I've read so far I have come to the following conclusion. A dedicated SQL Pool could do everything a Delta Lake could like ACID transactions, scaling capabilities, handle batch and streaming data etc. so what is are the differences between using a Delta Lake over a dedicated SQL Pool?


Solution

  • There are a number of differences.

    1. Dedicated SQL Pool is an "always-on" resource. You pay 24x7 for the service to be live whether you are using it or not and it is an expensive option. Delta consumption methods tend to be Consumption based, so you only pay for the compute you need.

    2. Dedicated SQL Pool is SQL focused. Data is stored in physical Tables, which need a proper architecture to perform well (Column store vs Clustered indexes, additional indexes, Partition and Distribution schemes, etc.) Overall, Dedicated SQL Pool requires a lot more know-how and management.

    3. Delta is really just a Storage format and can be natively consumed by Notebooks. In the case of Synapse (and now Fabric), Delta can also be queried with SQL, so it is more flexible than pure SQL.

    4. Delta Lake (or Lake tables) are really just metadata constructs over the Delta files in Storage. In Fabric, both Lakehouse and Warehouse are backed by Delta. Lakehouse is a Notebook-centric environment (that also supports SQL endpoints) similar to the Lake Tables in Synapse and Warehouse is a pure SQL environment like Dedicated SQL Pool, but without all the extra overhead. Both costs are managed by the same Fabric capacities.