Search code examples
azureazure-cosmosdbazure-data-factoryazure-synapseazure-data-lake-gen2

Copy activitiy (from Cosmos SQL api to ADLS gen2) getting failed in Synapse


I am trying to run a pipeline which copies data from Cosmos (SQL API) to ADLS gen2 for multiple tables. Lookup Activity is passing list of queries and Copy Activity runs within ForEach, using self-hosted IR. However it keeps failing after 1st iteration with below error:

Operation on target Copy data1_copy1 failed: Failure happened on 'Sink' side. ErrorCode=UserErrorFailedFileOperation,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Upload file failed at path tfs/OU Cosmos Data/LATAM/fact\dl-br-prod.,Source=Microsoft.DataTransfer.Common,''Type=Microsoft.Azure.Documents.RequestTimeoutException,Message=Request timed out.

Also I'm sure it is not the issue with any one specific table since I have tried passing queries in different order, in each attempt first query passed completes successfully and for rest of iteration Copy Activity runs for sometime and eventually fails.

I have tried following so far:

  1. Running ForEach in sequential mode
  2. Changing Block Size (in MB) on Sink side to 20MB. By default it is 100MB

Solution

  • I was able to get response from Microsoft Cosmos product team:

    Root cause:

    The SDK client is configured with some Timeout value and the request is taking longer time.

    Reason for the timeouts is an increase in Gateway latency (Gateway has no latency SLA) due to large result size. This is probably expected (more data tends longer to be read, sent, and received).

    Resolution:

    Increase the RequestTimeout used in the client.

    The team owning the Synapse Data Transfer (which uses the .NET 2.5.1 SDK and owns the Microsoft.DataTransfer aplication) can increase the RequestTimeout used on the .NET SDK to a higher value. In newer SDK versions, this value is 65 seconds by default.

    Though we have opted to bypass this route altogether and include either SynapseLink or Private Endpoint.