Search code examples
azureazure-synapse

How to save SQL query results from Azure Synapse to cloud storage files


I want to run an SQL query in Azure Synapse and store the results in some format (like compressed CSV) in Azure Blob or Azure Data Lake. Ideally the results will be a split into a set of blobs of adjustable size (like 32MB each) so we can download and consume them in parallel.

We have already figured out the opposite -- bulk loading from cloud storage is fairly well documented.

In Snowflake, we do this by creating an internal stage, using COPY INTO @stage FROM (query) and then downloading the stage files using the GET command. Does Synapse supported something similar?


Solution

  • You can utilize the CREATE EXTERNAL TABLE AS SELECT (CETAS) statement in both dedicated and serverless SQL pools to achieve the following:

    • Create an external table
    • Export the results of a Transact-SQL SELECT statement in parallel to Hadoop, Azure Storage Blob, or Azure Data Lake Storage Gen2

    You can create an external table and then export the results of a Transact-SQL SELECT statement in parallel.

    Know more about the CREATE EXTERNAL TABLE AS SELECT (CETAS) (Transact-SQL)

    To create an external table in Azure Synapse serverless, follow the steps below:

    Step 1: Right-click on the folder/file you want to create an external table for:

    enter image description here

    Step 2:

    enter image description here

    Step 3:

    enter image description here

    Results:

    CREATE EXTERNAL TABLE dbo.emp_ext_tbl01 (
    [emp_id] Int,
    [emp_name] nvarchar(4000),
    [emp_role] nvarchar(4000),
    [emp_salary] int
    )
    WITH (
    LOCATION = 'employeetbl.csv',
    DATA_SOURCE = [folder02_stgforsynp_dfs_core_windows_net],
    FILE_FORMAT = [SynapseDelimitedTextFormat]
    )
    GO
    SELECT TOP 100 * FROM dbo.emp_ext_tbl01
    

    enter image description here

    Note: As mentioned, if you want to create a staging folder in a storage account, you can use COPY ACTIVITY in Synapse, which directly supports the creation of an internal stage.

    Reference: Synapse Analytics save a query result in a blob storage