I am currently exploring a solution to host large-scale sensor data (ranging from GBs to TBs weekly) in Microsoft Azure as part of an Open Data initiative. The key objective is to store and publicly share this unstructured data efficiently, both technically and economically.
Given that Azure Blob Storage is a natural fit for this use case due to its:
However, I have concerns regarding outbound data transfer costs. While inbound data (uploading to Azure) is free, Azure charges for egress (outbound) traffic when data is downloaded by users. Since these datasets could be accessed frequently by the public, and the size ranges from TBs to potentially PBs, my assumption is that the egress costs could make this approach economically unviable.
Is there a way to reduce or avoid Azure egress costs for publicly shared Open Data using Blob Storage or other Azure services? (For example: special configurations, settings, or leveraging certain Azure pricing plans.)
Is it possible for clients/users downloading the data to cover the egress costs directly rather than the data owner?
Are there alternative Azure services or architectures better suited to hosting public Open Data with minimal or no egress costs?
My main goal is to minimize expenses related to serving the public with large-scale datasets while adhering to the principles of Open Data.
Is there a way to reduce or avoid Azure egress costs for publicly shared Open Data using Blob Storage or other Azure services? (For example: special configurations, settings, or leveraging certain Azure pricing plans.)
You can use the Azure CDN
to cache data closer to users, minimizing direct egress
from Blob Storage. This reduces costs but does not eliminate them entirely. here is document.
Also, you can check this document for egress costs for public datasets that contribute to societal benefits, such as research or innovation.
Is it possible for clients/users downloading the data to cover the egress costs directly rather than the data owner?
Azure does not natively support transferring egress costs to end-users.
You can setup users with SAS tokens for access after they cover the data costs. here is the document
Are there alternative Azure services or architectures better suited to hosting public Open Data with minimal or no egress costs?
Azure services, like Azure Data Share
or Azure Data Lake
, still incur egress charges, though they provide operational benefits.
If you need to truly minimal costs, consider external platforms like Internet Archive
or Zenodo
for hosting public datasets. here is document.