Search code examples
apache-kafkaapache-drill

Using Apache Drill to query kafka


I have been trying to figure out a way to use Apache Drill to query Kafka topics using SQL. Can some one give me a starter point so as to how can I connect drill to kafka. Any help would be appreciated.


Solution

  • Support for querying Kafka has been added in Drill 1.12 . I haven't used it myself but I will provide a quick outline of the general configuration required. If you run into more issues please contact us on Drill's mailing list http://drill.apache.org/mailinglists/ . We can help you debug the issue and then post the results here.

    The general outline of what you need to do is the following:

    1. Create a storage plugin in Drill's web ui. Name the plugin kafka

      {
        "bootstrap.servers": "broker_1:port1,broker_2:port2",
        "group.id": "drill-consumer-group-1",
        "enabled": true
      }
      
    2. After creating the plugin configuration, set the appropriate kafka message deserializer for your query: alter session set store.kafka.record.reader = org.apache.drill.exec.store.kafka.decoders.JsonMessageReader
    3. Also set a poll timeout that works for your query: alter session set store.kafka.poll.timeout = 200
    4. Try out a query: select * from kafka.myTopic;