The storage consumption on our ADLS Gen2 rose from 5 TB to 314 TB within 10 days and has maintained steady since then. It has just 2 containers:- $logs
container and a container with all directories for data storage. The $logs
container looks empty. I have tried looking at Folder Statistics
in Azure Storage Explorer on the other container and it does not seem any of the directories is big enough.
Interestingly, one of the directories was running the Folder Statistics
for few hours so I cancelled it. On cancellation, partial result showed 200+ TB and 88k+ blobs in it. I did a visual inspection of the directory and there were just a handful of blobs that would barely sum up to 1 GB. This directory had been present for months without issue. Regardless, I deleted this directory and checked the storage consumption after a few hours but could not see any change.
This brings to questions:-
Folder Statistic
, could it show an incorrect partial result (in the above case it showed 200TB whereas it looked barely 1 GB in reality)? I have done it on previous occasions but even the partial stats seemed feasible.Folder Statistic
on Azure Storage Explorer for all folders individually. But is there a better way to get the storage consumption at one go (at least classified for directory and their sub-directory level - I suppose blob level would be overkill but whatever works). I have access to Databricks with mount point to this container and can create a cluster with the required runtime if such code is specific to one.Update: We found the cause of the increase. It was, in fact, a few copy activities created by our team. Interestingly, when we deleted it, it took about 48 hrs before the storage graph actually started going down although the files disappeared immediately. This was not a delay in re4freshing the consumption graph, rather it actually took that time before we saw expected sharp dip in storage. We raised a Microsoft case and they confirmed that such an amount of data can take time to actually delete in the background.