Search code examples
azure-storageazure-blob-storage

How to find if a file (blob) exists in a folder (virtual directory) inside a container


I want to find if any file exists or not within my virtual directory inside my blob container . There is a possibility that there can be thousands of files inside it so I want to use the optimized approach . Also if the files exist I want to list those files .

My directory structure in container is Folder/Subfolder1/file.txt

Folder/Subfolder2/file.txt

So here I want to detect if there are files in Subfolder1 . Since azure storage has virtual directory structure Subfolder1 won’t exist in the absence of blobs . Hence I want to setup condition such that it finds if Subfolder1 exists and if yes list all files inside that.

Also is there any better way to get limited number of records from the GetBlobs(prefix: foldername) method , say if there are 500 files in my container and I want to list only 50 . Currently I am using a foreach loop and counter to list 50 files and break out of loop at count 50

            var inputfiles = ContainerClient.GetBlobsAsync(prefix: inputfolder);
                var count = 0;
                List<string> FilesList = new List<string>();

                await foreach (var blob in inputfiles)
                {
                    count++;
                    if (count == 50)
                        break;
                    FilesList.Add(blob.Name);
                }

I am using Azure SDK v12 and C# as language


Solution

  • Hence I want to setup condition such that it finds if Subfolder1 exists and if yes list all files inside that.

    As mentioned in your question and my comments, a subfolder doesn't really exist unless there's a blob in it because the subfolders are virtual in Azure Blob Storage.

    What you could do is simply try to list the blobs inside a subfolder by specifying subfolder path (Folder/Subfolder1/ in your case) as blob prefix. If blobs are present, you will get a list of blobs inside that subfolder. If blobs are not present, then you will get an empty collection. No need for you to check the existence first and then list the blobs because in both cases you will have to list the blobs.

    Above answer holds true for Azure Storage where hierarchical namespace is disabled (i.e. non data lake accounts). For Azure Data Lake Gen 2 Storage accounts, the approach would be different as folders there are not virtual.

    UPDATE

    Please change your code to something like below:

    var blobPages = ContainerClient.GetBlobsAsync(prefix: inputFolder).AsPages(pageSizeHint: 50);
    await foreach (var blobPage in blobPages)
    {
        foreach (var blob in blobPage.Values)
        {
            FilesList.Add(blob.Name);
        }
    }
    

    Essentially to limit the number of records, you have to add AsPages(pageSizeHint: 50) and it will return a maximum of top 50 blobs from that container matching the prefix.

    UPDATE 2

    Please try the code below.

    var blobPages = ContainerClient.GetBlobsAsync(prefix: inputFolder).AsPages(pageSizeHint: 50).GetAsyncEnumerator();
    await blobPages.MoveNextAsync();
    var blobPage = blobPages.Current.Values;
    foreach (var blob in blobPage)
    {
            FilesList.Add(blob.Name);
    }