Search code examples
javaazure-blob-storageazure-java-sdk

How to list only the files(As BlobItems / Client) residing in a BlobContainer : Azure Storage in Java


I have a following folder structure in the azure blob storage:

 -MainFolder    
     -subFolder1
       -foo1.json
       -foo2.json
     -subFolder2
        -foo1.json
        -foo3.json
     -subFolder3
        -foo1.json
        -foo4.json

Is there a way to only read/ list out the foo1.json from each of these folders? Here is the code sample of what I am using:

  BlobContainerClient containerClient = getBlobClient();
    ListBlobsOptions options = 
            new ListBlobsOptions()
            .setPrefix(blobStoragePath)
            .setDetails(new BlobListDetails().setRetrieveMetadata(true)
                    .setRetrieveDeletedBlobs(true));

    PagedIterable<BlobItem> listVersion = containerClient.listBlobsByHierarchy("/", options,
           Duration.ofSeconds(30l));

The code loops through listVersion to get the BlobClient(for foo1.json) for each prefix. Unfortunately, in this case the BlobItem in listVersion doesn't contain properties which would've contained the lastModified dateTime.

I am planning to compare the lastModifiedDateTime for these blobs(foo1.json) with the lastModified date of previously saved version of this blob in the hashMap and update the hashMap with the latest changes if the lastModifiedDateTime doesn't match each time a new request is made.

Trying to make as minimal call to the blobService as possible.

If I use this method:

PagedIterable<BlobItem> listVersionBlobClients =    containerClient.listBlobs(options, null);

Will list out all the blobItems(all of the files foo1, foo2....) which will delay the process (as I'll have to check if the file is foo1 or not for each call)

Any help/suggestion is much appreciated.


Solution

  • I tried in my environment and got below results:

    Code:(Listing blobs)

    import  com.azure.storage.blob.*;
    import  com.azure.storage.blob.BlobServiceClient;
    import  com.azure.storage.blob.models.BlobItem;
    public  class  App
    {
    public  static  void  main( String[] args )
    {
    String  connectStr = "DefaultEndpointsProtocol=https;AccountName=venkat123;AccountKey=kRKhMDbJayadjfM9hYsFjv9f/vR8jVOue0snqt2M9z10umhXwwwbqmDF/jw70ceY2YzWQ+4dSp+Q+AStXztgFw==;EndpointSuffix=core.windows.net";
    BlobServiceClient  blobServiceClient = new  BlobServiceClientBuilder().connectionString(connectStr).buildClient();
    String  containerName = "test";
    BlobContainerClient  containerClient = blobServiceClient.getBlobContainerClient(containerName);
    System.out.println("\nListing blobs...");
    for (BlobItem  blobItem  :  containerClient.listBlobs()) {
    System.out.println("\t" + blobItem.getName());
    }
    }
    }
    

    According to Gaurav Mantri comment it is correct you need to list blobs firstly, and then you can use filter like file in your client side.

    Console: enter image description here

    You can use below method and make sure your azure storage log is enabled.:

    ListBlobsOptions options = new ListBlobsOptions()
         .setPrefix("prefixToMatch")
         .setDetails(new BlobListDetails()
             .setRetrieveDeletedBlobs(true)
             .setRetrieveSnapshots(true));
     String continuationToken = "continuationToken";
     Duration duration = Duration.ofMinutes(2);
     client.listBlobs(options, continuationToken, duration).forEach(blob ->
         System.out.printf("Name: %s, Directory? %b, Deleted? %b, Snapshot ID: %s%n",
             blob.getName(),
             blob.isPrefix(),
             blob.isDeleted(),
             blob.getSnapshot()));
    

    Another method for listing blobs you can also refer this SO-Thread by Josef Šustáček.

    Reference: Listing blobs with a special character in the blob name (github.com).