Search code examples
azure-data-lakeprestoquery-engine

Presto query engine with Azure Data Lake


I have a requirement to deploy a presto server which can help me query data stored in ADLS in Avro file formats. I have gone through this tutorial and it seems that the Hive is used as a catalogue/connector in presto to query from ADLS. Can I bypass Hive and have any connector to extract data from ADLS?


Solution

  • Can I bypass Hive and have any connector to extract data from ADLS?

    No.

    Hive here plays two roles here:

    • storage for metadata. It contains information like:
      • schema and table name
      • columns
      • data format
      • data location
    • execution
      • it is capable to read data from (HDFS) distributed file systems (like HDFS, S3, ADLS)
      • it tells how execution can be distributed.