Search code examples
parquetapache-drillsnappy

Apache Drill reading Parquet


I'm trying to do the Apache Drill in 10 minutes tutorial, but I get stuck on the reading parquet part. Reading CSV are fine, but when I try to read the sample parquet files using the exact format from the tutorial, I get an error. I am adjusting the path correctly.

SELECT * FROM dfs.`/path/to/drill/sample-data/nation.parquet`;

Output:

Error: SYSTEM ERROR: UnsatisfiedLinkError: /tmp/snappy-1.1.7-67ad3418-1ee8-4c7a-88eb-7faf132ce52a-libsnappyjava.so: /tmp/snappy-1.1.7-67ad3418-1ee8-4c7a-88eb-7faf132ce52a-libsnappyjava.so: failed to map segment from shared object: Operation not permitted

Fragment 0:0

Please, refer to logs for more information.

[Error Id: b62d40f7-e8fb-4f78-a93a-8359033b216f on <host-id:port-id>] (state=,code=0)

If I run it subsequent times, the error changes:

Error: SYSTEM ERROR: NoClassDefFoundError: Could not initialize class org.xerial.snappy.Snappy

Fragment 0:0

Please, refer to logs for more information.

[Error Id: bfc38c87-30f2-455c-be15-d6b3aac2943d on <host-id:port-id>] (state=,code=0)

I know the error is related to Snappy compression, because if I create my own parquet files uncompressed, they are read in perfectly fine. The UUID part of the Snappy file is random and changes with each iteration.

How can I install Snappy, or feed the snappy-java.jar, so that Drill uses it? Is there something wrong with the /tmp/ folder setup?

https://drill.apache.org/docs/drill-in-10-minutes/


Solution

  • Looks like it is not a Drill-related issue. Looks like /tmp is noexec or there is another permissions issue with this directory.