Search code examples
ruby-on-railsazure-storageazure-blob-storagerails-activestorage

Number of Blobs is greater than number of uploaded files


TL;DR: solved this is not an issue - my investigation was wrong

I'm using Azure Storage Accounts to store files uploaded via web-application. The way it's set up is that only files uploaded via the WebApplication are stored in the Storage Account Container (so no other way how the files should appear in the container, no direct manipulation with Storage Account container or logs or anything. Just uploads)

Now because the WebApplication ) Ruby on Rails 6.1 with ActiveStorage lib uploads) is creating row in SQL table for every file uploaded it's super easy for me to get the total number of uploaded files:

ActiveStorage::Blob.count
=> 437538

I should have around 437 538 files on Azure Blob storage Container.

Now when I've checked the total number of files from web UI Storage Explorer (preview) > container > more > folder statistics it's says I have around 1 133 000 Blobs

enter image description here

I done the same from the Azure Storage desktop app it's around 1 132 000 Blobs

enter image description here

Therefore I'm expecting around 437k files yet I have 1.3 Milion Blobs. Note: Files upladed are not large - images or pdfs up to 50MB

My Question: Does it mean the files are not mapped 1:1 to Blob count ? If so how can I translate the Blob count to File count ?

I thought this may be Snapshots but the WebUI clearly says "not including snapshots"

Now Yes some records could have been deleted from SQL table and not from Azure Blob but I would expect up to 1000 such errors (due to nature of application - it's more like archive where you don't delete anything)


Solution

  • ok my entire investigation is wrong ! (My Apologies )

    short answer: yes blob count is 1:1 to file count

    reason why I have 1.3 Milion records is that I didn't consider thumbnails that are processed by WebApplication (my bad)

    Full details (for Ruby on Rails developers):

    The way how ActiveStorage lib works is that when you upload a file a sql row is created in the ActiveStorage::Blob table. So you have 1:1 file upladed to Azure and sql record. So at this point I would have 436k files == 436k sql records

    But when an ActiveStrage variant is requested (thumbnail, preview, ...) the lib will shrink the thumbnail and upload it to Azure Blob for future use as cache that means at this point you have 436k sql rows == 436k files + 1 file (the thumbnail)

    now consider many different sizes of thumbnails and you quickly end up with extra 864k files

    I hope this will help someone in the future