Search code examples
postgresqlclickhousedruid

How to access a database using Apache druid


colleagues! Please tell me, is it possible to set up proxying in apache druid somehow? It is necessary to execute sql queries in Apache Druid while accessing postgresql and clickhouse databases. I know that apache druid stores metadata in postgresql, but how to configure it so that data sources can be made from the database

And so, the question.. How can I access the database using Apache druid using proxying?

Attempts to make friends with apache druid and databases have not been successful so far.. Is it really possible to implement the given plan? If there are no options, then we will screw s3


Solution

  • The EXTERN function allows you to access data outside of Druid:

    https://druid.apache.org/docs/latest/multi-stage-query/concepts#read-external-data-with-extern

    You can use this to connect to any supported "inputSource":

    https://druid.apache.org/docs/latest/ingestion/input-sources

    Note that Apache Druid loves streams and being a super-fast GROUP BY calculator, rather than a "proxy".

    So as an alternative, you could look at Trino / PrestoDB as your "proxy":

    https://trino.io/episodes/16.html