azure azure-data-lake azure-synapse azure-data-lake-gen2

Should you use the primary storage of Azure Synapse as your data lake?

Azure Synapse Analytics requires an ADLSGEN2 account to create a workspace. The documentation says the following about the purpose of this storage:

Your Azure Synapse workspace will use this storage account as the "primary" storage account and the container to store workspace data. The workspace stores data in Apache Spark tables. It stores Spark application logs under a folder called /synapse/workspacename.

Should you use this storage account to build a data lake? Or should you use an additional ADLSGEN2 account not to interfere with Synapse's workspace data?

Solution

The synapse metadata (and Spark database) are housed in the file system (container) you specified at workspace creation time. Our convention it to name this "synapseroot". You should not use this container for any other purpose, let the system manage its contents. But you can absolutely create other containers to work with your own data, so you should not need an additional ADLS account.