I cannot copy a 30 GB SQL Server table from this machine to my Azure Data Lake Gen 2 storage account as a 530 MB Parquet file using Azure Data Factory. The compression type is gzip. The Throughput is 11.8 MB/s
The failed ADF copy error message is:
{ "errorCode": "2200", "message": "Failure happened on 'Sink' side. ErrorCode=UserErrorFailedBlobFSOperation,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=BlobFS operation failed for: A task was canceled.. Account: 'datalake'. FileSystem: &aposcontainer-dl'. Path: 'ImportLayer/F61ILBarAcct_Txns.parquet'.,Source=Microsoft.DataTransfer.ClientLibrary,''Type=System.Threading.Tasks.TaskCanceledException,Message=A task was canceled.,Source=mscorlib,'", "failureType": "UserError", "target": "Copy Latest Source Data" }
It is the same on the client integration runtime log
DEBUG:
TraceComponentId: TransferClientLibrary
TraceMessageId: BlobFSOperationRetry
@logId: Warning
jobId: c063e070-cc12-4cae-895f-f8ada2bfa3ff
activityId: ecfa652d-8471-4297-be2a-4ecc0ebc89c5
eventId: BlobFSOperationRetry
message: 'Type=System.Threading.Tasks.TaskCanceledException,Message=A task was canceled.,Source=mscorlib,StackTrace= at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.Azure.Storage.Data.AzureDfsClient.<UpdatePathWithHttpMessagesAsync>d__41.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.Azure.Storage.Data.AzureDfsClientExtensions.<UpdatePathAsync>d__24.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Microsoft.Azure.Storage.Data.AzureDfsClientExtensions.UpdatePath(IAzureDfsClient operations, String action, String filesystem, String path, Nullable`1 position, Nullable`1 retainUncommittedData, String contentLength, String xMsLeaseAction, String xMsLeaseId, String xMsCacheControl, String xMsContentType, String xMsContentDisposition, String xMsContentEncoding, String xMsContentLanguage, String xMsProperties, String ifMatch, String ifNoneMatch, String ifModifiedSince, String ifUnmodifiedSince, Stream requestBody, String xMsClientRequestId, Nullable`1 timeout, String xMsDate)
at Microsoft.Azure.Storage.Data.BlobFSClient.<>c__DisplayClass37_0.<AppendFile>b__1()
at Microsoft.Rest.TransientFaultHandling.RetryPolicy.<>c__DisplayClass16_0.<ExecuteAction>b__0()
at Microsoft.Rest.TransientFaultHandling.RetryPolicy.ExecuteAction[TResult](Func`1 func),'
On the client machine, the cpu is a Intel Xeon E7-2830 @2.13Ghz, 64bit OS. It has 16.0 GB of Ram. It has a 40 GB Hard drive with 10 GB free space. I increase the virtual memory max to 10 GB so it can use the free space. For the Java pption, I set -Xmx, Java max heap memory, to 26 GB to take advantage of that. I can only use this client machine which has the Integration Runtime installed on it.
What could be the problem?
I manage to solve it by using the compression type snappy instead of gzip. It uses less processing power.
In addition, i ran the copy one at a time instead of many at once. It is slower but safer