Search code examples
azure.net-coreazure-data-lakeazure-blob-storage

ListBlobsSegmentedAsync suddenly stops returning files in a Blob directory even though Storage explorer shows files are there


I have tried different ways of searching for files in a folder, even though Azure Storage explorer clearly shows there are files present and the exact same code and configuration has worked before, ListBlogsSementedAsync returns 0 files.

Code used:

    var test = await directoryInfo.ListBlobsSegmentedAsync(new BlobContinuationToken());

    var fileList = await directoryInfo.ListBlobsSegmentedAsync(true, BlobListingDetails.None, take, null, null, null, new CancellationToken());

I am using Storage Accounts configured as Datalake v2.

I had the same thing happen with storage account configured as standard blob, in that case I could rename the folder and it would work again. With datalake renaming didn't work. Renaming is not really a viable workaround anyway.

I have tried using different BlobListingDetails as well, nothing worked there.

The process that is running is I have a separate job that is uploading files for processing into folders and this job lists the first X files in the folder and downloads them for further processing. This works for a while, but after a while, less than a day, ListBlobsSegmentedAsync returns 0 files. When I use Azure Storage Explorer and check the folder there are thousands of files in the folder and based on data processed this is the correct answer.

EDIT:

Implemented with continuation token:

        var directoryInfo = _blobContainer.GetDirectoryReference(directory);

        BlobContinuationToken blobContinuationToken = null;
        var list = new List<IListBlobItem>();
        do
        {
            var resultSegment = await directoryInfo.ListBlobsSegmentedAsync(blobContinuationToken);

            // Get the value of the continuation token returned by the listing call.
            blobContinuationToken = resultSegment.ContinuationToken;
            list.AddRange(resultSegment.Results);
        }
        while (blobContinuationToken != null && list.Count < take); // Loop while the continuation token is not null.

        var filePaths = list.Select(x => (x as IListBlobItem)?.Uri.ToString());
        return filePaths.Where(x => !string.IsNullOrEmpty(x)).ToList();

Solution

  • I don't think you should do new BlobContinuationToken(). This may "confuse" the SDK. You have to pass null initially. Also, are you really iterating over the results? I.e. evaluating the resulting ContinuationToken? The first page can always be empty, you have to check the token to detect if there are any more possible results.

    Why is ListBlobsSegmentedAsync only returning results on second page?

    It's not at all unexpected that you can occasionally get empty pages or pages with less than the max results along with a continuation token.

    https://github.com/Azure-Samples/azure-sdk-for-net-storage-blob-upload-download/blob/master/v11/Program.cs

    BlobContinuationToken blobContinuationToken = null;
    
    do
    {
        var resultSegment = await cloudBlobContainer.ListBlobsSegmentedAsync(null, blobContinuationToken);
    
        // Get the value of the continuation token returned by the listing call.
        blobContinuationToken = resultSegment.ContinuationToken;
        foreach (IListBlobItem item in resultSegment.Results)
        {
            Console.WriteLine(item.Uri);
        }
    } 
    while (blobContinuationToken != null); // Loop while the continuation token is not null.
    

    In your case you might not want to wait until the token is null, you will probably want to combine it with keeping track of the returned item count.