Search code examples
azureazure-blob-storageazure-storagecloud-storageopendata

How to Minimize Egress Costs in Azure for Public Open Data Sharing


I am currently exploring a solution to host large-scale sensor data (ranging from GBs to TBs weekly) in Microsoft Azure as part of an Open Data initiative. The key objective is to store and publicly share this unstructured data efficiently, both technically and economically.

Given that Azure Blob Storage is a natural fit for this use case due to its:

  • Ability to store unstructured data.
  • Virtually unlimited storage capacity.
  • Pay-as-you-go pricing model, where costs are based only on the consumed storage.

However, I have concerns regarding outbound data transfer costs. While inbound data (uploading to Azure) is free, Azure charges for egress (outbound) traffic when data is downloaded by users. Since these datasets could be accessed frequently by the public, and the size ranges from TBs to potentially PBs, my assumption is that the egress costs could make this approach economically unviable.

My Questions Are:

  1. Is there a way to reduce or avoid Azure egress costs for publicly shared Open Data using Blob Storage or other Azure services? (For example: special configurations, settings, or leveraging certain Azure pricing plans.)

  2. Is it possible for clients/users downloading the data to cover the egress costs directly rather than the data owner?

  3. Are there alternative Azure services or architectures better suited to hosting public Open Data with minimal or no egress costs?

My main goal is to minimize expenses related to serving the public with large-scale datasets while adhering to the principles of Open Data.


Solution

  • Is there a way to reduce or avoid Azure egress costs for publicly shared Open Data using Blob Storage or other Azure services? (For example: special configurations, settings, or leveraging certain Azure pricing plans.)

    • You can use the Azure CDN to cache data closer to users, minimizing direct egress from Blob Storage. This reduces costs but does not eliminate them entirely. here is document.

    • Also, you can check this document for egress costs for public datasets that contribute to societal benefits, such as research or innovation.

    Is it possible for clients/users downloading the data to cover the egress costs directly rather than the data owner?

    Azure does not natively support transferring egress costs to end-users.

    You can setup users with SAS tokens for access after they cover the data costs. here is the document

    Are there alternative Azure services or architectures better suited to hosting public Open Data with minimal or no egress costs?

    Azure services, like Azure Data Share or Azure Data Lake, still incur egress charges, though they provide operational benefits.

    If you need to truly minimal costs, consider external platforms like Internet Archive or Zenodo for hosting public datasets. here is document.