Search code examples
raspberry-piapache-drill

Can I run Apache Drill on Raspberry Pi's and discover the physical cost of a query plan?


I watched this video about Apache Drill (https://www.youtube.com/watch?time_continue=14&v=0rurIzOkTIg) which says I can install DrillBit on the nodes of my cluster and the Drill engine will evaluate the best physical plan to execute a query. Then I can run explain plan for a query (https://drill.apache.org/docs/query-plans/) and I will see where drill decided to data locality processing in-memory or not and other cost decisions. This is another reference that I was reading (Apache Drill vs Spark).

I also see that Drill has a plugin for filesystems. So I image that I can install Drill on 3 computers and query log files on them.

I wonder If it is possible to install Drill on Raspberry Pi's that have a variety of connections (wired, wireless, radio, ...) and execute a query on log files located on these Pis. Is it also the purpose of Drill?


Solution

  • Drill can really query log files from different storages, but I am not sure that Raspberry Pi's specs fit Drill resources requirements:

    The default memory for a Drillbit is 8G, but Drill prefers 16G or more depending on the workload

    https://drill.apache.org/docs/configuring-drill-memory/

    Anyway it can be possible to run drillbit on the machine with a smaller amount of memory, but it will not be enough to process a big data sets.