Search code examples
azurefactazure-stream-analyticsazure-data-factorycortana-intelligence

Is Azure Stream Analytics suitable for generating data warehouse fact and dimensions tables?


I have the following scenario that I am thinking of implementing via Azure Stream Analytics.

My input consists of:

  1. Events streaming in from an Azure Event Hub.
  2. Reference data that relates to the events. Some of this data is "slowly changing" from day to day.

I need to join the events and the reference data, process them and output tables that will make up a "data warehouse" (with Power BI in mind as the consumer).

The output would be made up of:

  1. A facts table where the most important events are stored.
  2. A few dimension tables that hold the values that make up the facts.

Is Azure Stream Analytics suitable for this kind of work? It seems to me that ASA is well suited to persisting the events from the event hub stream and into a facts table. However, the additional work of keeping the dimensions tables up to date - i.e. adding new values periodically - is not a good fit.

Am I correct in this analysis? Should I switch over to Azure Data Factory for my project?


Solution

  • Is Azure Stream Analytics suitable for this kind of work? not really.Stream Analytics is made for Real-time stream processing in the cloud.

    As you pointed out, the additional work of keeping the dimensions tables up to date - i.e. adding new values periodically - is not a suitable job for ASA (Azure Stream Analytics)

    For this part/functionality you can use Azure Data Factory (ADF), it is a cloud-based data integration service that orchestrates and automates the movement and transformation of data. which is what you require here.

    Also, Azure HDInsight Hadoop-based enables ETL scenarios, you can check this example of "ETL Workflow for the racecar telemetry data" over here https://msdn.microsoft.com/en-us/library/dn749886.aspx