Search code examples
javaazureazure-blob-storageazure-java-sdkazure-sdk-for-java

Upload of large files using azure-sdk-for-java with limited heap


We are developing document microservice that needs to use Azure as a storage for file content. Azure Block Blob seemed like a reasonable choice. Document service has heap limited to 512MB (-Xmx512m).

I was not successful getting streaming file upload with limited heap to work using azure-storage-blob:12.10.0-beta.1 (also tested on 12.9.0).

Following approaches were attempted:

  1. Copy-paste from the documentation using BlockBlobClient
BlockBlobClient blockBlobClient = blobContainerClient.getBlobClient("file").getBlockBlobClient();

File file = new File("file");

try (InputStream dataStream = new FileInputStream(file)) {
  blockBlobClient.upload(dataStream, file.length(), true /* overwrite file */);
}

Result: java.io.IOException: mark/reset not supported - SDK tries to use mark/reset even though file input stream reports this feature as not supported.

  1. Adding BufferedInputStream to mitigate mark/reset issue (per advice):
BlockBlobClient blockBlobClient = blobContainerClient.getBlobClient("file").getBlockBlobClient();

File file = new File("file");

try (InputStream dataStream = new BufferedInputStream(new FileInputStream(file))) {
  blockBlobClient.upload(dataStream, file.length(), true /* overwrite file */);
}

Result: java.lang.OutOfMemoryError: Java heap space. I assume that SDK attempted to load all 1.17GB of file content into memory.

  1. Replacing BlockBlobClient with BlobClient and removing heap size limitation (-Xmx512m):
BlobClient blobClient = blobContainerClient.getBlobClient("file");

File file = new File("file");

try (InputStream dataStream = new FileInputStream(file)) {
  blobClient.upload(dataStream, file.length(), true /* overwrite file */);
}

Result: 1.5GB of heap memory used, all file content is loaded into memory + some buffering on the side of Reactor

Heap usage from VisualVM

  1. Switch to streaming via BlobOutputStream:
long blockSize = DataSize.ofMegabytes(4L).toBytes();

BlockBlobClient blockBlobClient = blobContainerClient.getBlobClient("file").getBlockBlobClient();

// create / erase blob
blockBlobClient.commitBlockList(List.of(), true);

BlockBlobOutputStreamOptions options = (new BlockBlobOutputStreamOptions()).setParallelTransferOptions(
  (new ParallelTransferOptions()).setBlockSizeLong(blockSize).setMaxConcurrency(1).setMaxSingleUploadSizeLong(blockSize));

try (InputStream is = new FileInputStream("file")) {
  try (OutputStream os = blockBlobClient.getBlobOutputStream(options)) {
    IOUtils.copy(is, os); // uses 8KB buffer
  }
}

Result: file is corrupted during upload. Azure web portal shows 1.09GB instead of expected 1.17GB. Manual download of the file from Azure web portal confirms that file content was corrupted during upload. Memory footprint decreased significantly, but file corruption is a showstopper.

Problem: cannot come up with a working upload / download solution with small memory footprint

Any help would be greatly appreciated!


Solution

  • Pls try the code below to upload/download big files, I have tested on my side using a .zip file with size about 1.1 GB

    For uploading files:

    public static void uploadFilesByChunk() {
                    String connString = "<conn str>";
                    String containerName = "<container name>";
                    String blobName = "UploadOne.zip";
                    String filePath = "D:/temp/" + blobName;
    
                    BlobServiceClient client = new BlobServiceClientBuilder().connectionString(connString).buildClient();
                    BlobClient blobClient = client.getBlobContainerClient(containerName).getBlobClient(blobName);
                    long blockSize = 2 * 1024 * 1024; //2MB
                    ParallelTransferOptions parallelTransferOptions = new ParallelTransferOptions()
                                    .setBlockSizeLong(blockSize).setMaxConcurrency(2)
                                    .setProgressReceiver(new ProgressReceiver() {
                                            @Override
                                            public void reportProgress(long bytesTransferred) {
                                                    System.out.println("uploaded:" + bytesTransferred);
                                            }
                                    });
    
                    BlobHttpHeaders headers = new BlobHttpHeaders().setContentLanguage("en-US").setContentType("binary");
    
                    blobClient.uploadFromFile(filePath, parallelTransferOptions, headers, null, AccessTier.HOT,
                                    new BlobRequestConditions(), Duration.ofMinutes(30));
            }
    

    Memory footprint: enter image description here

    For downloading files:

    public static void downLoadFilesByChunk() {
                    String connString = "<conn str>";
                    String containerName = "<container name>";
                    String blobName = "UploadOne.zip";
    
                    String filePath = "D:/temp/" + "DownloadOne.zip";
    
                    BlobServiceClient client = new BlobServiceClientBuilder().connectionString(connString).buildClient();
                    BlobClient blobClient = client.getBlobContainerClient(containerName).getBlobClient(blobName);
                    long blockSize = 2 * 1024 * 1024;
                    com.azure.storage.common.ParallelTransferOptions parallelTransferOptions = new com.azure.storage.common.ParallelTransferOptions()
                                    .setBlockSizeLong(blockSize).setMaxConcurrency(2)
                                    .setProgressReceiver(new com.azure.storage.common.ProgressReceiver() {
                                            @Override
                                            public void reportProgress(long bytesTransferred) {
                                                    System.out.println("dowloaded:" + bytesTransferred);
                                            }
                                    });
    
                    BlobDownloadToFileOptions options = new BlobDownloadToFileOptions(filePath)
                                    .setParallelTransferOptions(parallelTransferOptions);
                    blobClient.downloadToFileWithResponse(options, Duration.ofMinutes(30), null);
            }
    

    Memory footprint: enter image description here

    Result: enter image description here