Why Apache Druid is considered real-time database?

This is a question that relates to how Druid is being marketed.

Why is it called real time database, when - as I understand - before any data can be efficiently read from DB there is a need for heavy lifting ETL using external tool (like Hive or Spark) which loads semi-aggregated data to Druid before the database writes this input in efficient, column store based manner.

My understanding would be that Druid can be considered real time in terms of communication between Druid and querying UI but not between the truth source (including real time transactions) and Druid, because of analytics (possibly multiple joins) required in between.

Solution

Druid supports realtime ingestion through Kafka Streaming and data is available to query immediately that is why it's being considered as a real time data store.

Druid also supports batch ingestion as you mentioned using Hive and Spark.

Here's the more details on Apache Druid:

Apache druid is OLAP data store designed to provide sub-second query performance while ingesting data in realtime or in batch.

Ways to ingest data in Druid

Realtime Ingestion - Druid can use Kafka topics to ingest data in real time.
Batch Ingestion - Druid uses Hive and Spark to read datasets from HDFS. In this case it's not real time but there are use cases which does not need to be in realtime and just needs to have a requirement of faster response time for adhoc queries.

Where druid is a great fit:

Applications with event based data.
Less updates on data
Sub second response time

When you should not consider druid

High number of Joins
More updates on data

Hot Industries/Application for Druid

IOT services
Network monitoring
Digital Marketting
Any time based streaming application