Search code examples
azureblobazcopy

Why does Azcopy work for small files but not large ones in Azure?


I can copy a folder or a small file with Azcopy (copy) command but if i apply the command to a large file the transfer fails with no clear logs.

Command:

azcopy copy 'test' 'https://mystorage.blob.core.windows.net/...pKmRpGCVO8B' --recursive --preserve-smb-info=true

Test is a folder containing a large file. (approx 10 GB)

The logs simply stops when the process get killed. Output is as follows:

INFO: Scanning...
INFO: Any empty folders will not be processed, because source and/or destination doesn't have full folder support

Job a50e4368-5fd0-534b-657d-70216b0e2 has started
Log file is located at: /root/.azcopy/a50e4368-5fd0-534b-657d-70216b0e2.log

Killed

Logs only shows:

cat /root/.azcopy/a50e4368-5fd0-534b-657d-70216b0e2.log
2023/05/23 17:51:19 AzcopyVersion  10.18.1
2023/05/23 17:51:19 OS-Environment  linux
2023/05/23 17:51:19 OS-Architecture  amd64
2023/05/23 17:51:19 Log times are in UTC. Local time is 23 May 2023 17:51:19
2023/05/23 17:51:20 ISO 8601 START TIME: to copy files that changed before or after this job started, use the parameter --include-before=2023-05-23T17:51:14Z or --include-after=2023-05-23T17:51:14Z
2023/05/23 17:51:20 Any empty folders will not be processed, because source and/or destination doesn't have full folder support
2023/05/23 17:51:20 Job-Command copy test https://[...]rw&sr=c&st=2023-05-23t15%3A56%3A08z&sv=2022-11-02 --recursive --preserve-smb-info=true
2023/05/23 17:51:20 Number of CPUs: 1
2023/05/23 17:51:20 Max file buffer RAM 0.500 GB
2023/05/23 17:51:20 Max concurrent network operations: 32 (Based on number of CPUs. Set AZCOPY_CONCURRENCY_VALUE environment variable to override)
2023/05/23 17:51:20 Check CPU usage when dynamically tuning concurrency: true (Based on hard-coded default. Set AZCOPY_TUNE_TO_CPU environment variable to true or false override)
2023/05/23 17:51:20 Max concurrent transfer initiation routines: 64 (Based on hard-coded default. Set AZCOPY_CONCURRENT_FILES environment variable to override)
2023/05/23 17:51:20 Max enumeration routines: 16 (Based on hard-coded default. Set AZCOPY_CONCURRENT_SCAN environment variable to override)
2023/05/23 17:51:20 Parallelize getting file properties (file.Stat): false (Based on AZCOPY_PARALLEL_STAT_FILES environment variable)
2023/05/23 17:51:20 Max open files when downloading: 1048152 (auto-computed)
2023/05/23 17:51:20 Final job part has been created
2023/05/23 17:51:20 JobID=a50e4368-5fd0-534b-657d-70216b4960e2, credential type: Anonymous
2023/05/23 17:51:20 Final job part has been scheduled
2023/05/23 17:51:20 INFO: [P#0-T#0] Starting transfer: Source "/home/user/test/myfile.gz" Destination "https://mystorage.blob.core.windows.net/[...]02". Specified chunk size 8388608

The process works with smaller files but fails with large files. I couldn't find a file limit on the Azcopy setting or a related limitation from the logs.


Solution

  • You can solve by adjusting the following values/parameters:

    -- block-size-mb (in line parameter): to set the chuck size and

    -- AZCOPY_CONCURRENCY_VALUE (session variable): to set the parallelization degree needed ( based on the free ram available and the block-size-mb)