Search code examples
azuredatasetazure-synapselinked-service

Azure Synapse Dataset


I am very new to Azure Synapse and have come across the topic of ‘Datasets’. I have this confusion of why do we need datasets and not directly use a reference to our data file while creating integrated dataset?

And also, what is the difference between linked services and dataset? Can’t we directly link our data through linked services?


Solution

  • Linked Services

    Linked services define the service level connection, including any required authentication. Examples would be Azure SQL Database, Storage Account, SFTP Server, etc. As such, a Linked Service references the service, not the data contained within.

    A Linked Service can be referenced many times, and all Datasets require a Linked Service reference.

    Datasets

    Datasets define the runtime access to resources/data contained in the Linked Service. Examples would be SQL tables, Containers/Folders/Blobs in Storage, files in SFTP, and many others. The Dataset type will determine what kind of Linked Service reference is required.

    Datasets are extremely flexible. They can point directly to specific resources or they can define parameters, or combinations of these two approaches. They can define schemas or not. You need to configure a Dataset to meet your specific needs.

    It is important to understand that the Dataset does not have/hold/contain the data. Rather it is used by specific activities [such as Lookup, Copy, Data Flow, etc.] to access the data.

    So the short answer is you CAN create a direct reference to your data, but you do it via a Dataset.