Search code examples
rollupdruidapache-druid

How to achieve time-slab based rollup in druid


I have a use case where I want my data to be rolled up in druid in following manner :

  • data belonging to <90 days should be rolled up Hourly
  • data belonging to last 3-6 months should be rolled up Daily
  • data belonging to last 6-12 months should be rolled up Monthly
  • data belonging to last 1-3 years should be rolled up Yearly

How can I achieve this? I tried it by using mixed type granularitySpec, but it didn't work.


Solution

  • Rollup is applied through tasks. This could be a streaming ingestion task (e.g. from Kafka using granularitySpec with rollup to true and queryGranularity set to some time window) or batch task (e.g. an ingestion using GROUP BY and truncating the timestamp).

    You can see this working on streaming ingestion task in the notebook at https://github.com/implydata/learn-druid/blob/main/notebooks/02-ingestion/16-native-groupby-rollup.ipynb - there's also one on batch GROUP BY ingestion.

    Therefore, to reshape your data, you need to run a task that allows you to read from the table and write it either into the same table, or into a new one. There are three ways to do this inside Apache Druid:

    1. Use compaction - picks up from the table and back into the same table.
    2. Run a re-indexing job - again from / to the same table.
    3. Use one table as the source for a batch ingestion to another table.

    Check out https://github.com/implydata/learn-druid/blob/main/notebooks/05-operations/05-compaction-data.ipynb for a run-through of the compaction approach.

    I'm not aware of any functionality that will automatically change the data granularity over time, though there are people in the community who have built their own processes using these APIs. Typically these are scheduled to run overnight, and are specific about the time periods and aggregations according to need.