Background:
I have an Airflow job that collects data (market share of the industry in the last 24 hours) from third-party endpoints every 10 minutes. I currently store the data in S3. We want to use this data to calculate our market share. I plan to keep the data in Data Warehouse as a time series standalone table( not a dimension or fact) to be used in queries to compare it with our data (stored in a star schema)to get daily market share.
Questions:
Is the data warehouse the right place to store time-series data? Can we store standalone tables similar to this? Does a table need to be part of the star schema to keep it in Data Warehouse?
Snowflake has no restrictions on its use to store time series data, the data does not need to be stored in a star schema. You have to consider how you load this data, it depends on how quickly you want to see it in the data warehouse. You can use the Snowpipe functionality and load data on the fly.
The presentation: Snowpipe: Load data fast, analyze even faster
Documentation: Introduction to Snowpipe
Or, load data at specified intervals using the STREAM and TASK functionality. Set e.g. a cyclic task every 10 minutes and download data if STREAM detected any new data on AWS S3.
The presentation: Randy Pitcher Streams & Tasks Hands on Lab
Documentation: Introduction to Tasks and CREATE STREAM