Search code examples
user-defined-functionsapache-drill

Dynamic UDF in Apache Drill cluster


I have drill cluster, with 4 drillbits (drill 1.14). But I can not use dynamic UDF feature in cluster for some kind of reason. Every time, I was confronting with troubles.

Let me present 2 scenarios:

Scenario 1
Here is the config (configs are same for all drillbits):

drill.exec: {
  cluster-id: "drill-test",
  zk: {
    connect: "vm29.local:2181,vm32.local:2181,vm39.local:2181",
    root: "drill"
    },
  sys.store.provider.zk.blobroot: "hdfs://vm29.local:9000/apps/drill/pstore/",
  http: {
    enabled: true,
    ssl_enabled: false,
    port: 8047
    session_max_idle_secs: 3600, # Default value 1hr
    cors: {
      enabled: true,
      allowedOrigins: ["*"],
      allowedMethods: ["GET", "POST", "HEAD", "OPTIONS"],
      allowedHeaders: ["X-Requested-With", "Content-Type", "Accept", "Origin"],
    }
  }
}

drill.exec.udf: {
                retry-attempts: 5,
   directory: {
        fs: "hdfs://vm29.local:9000/",
        root: "/drill",
                base: "/udf",
                local: ${drill.exec.udf.directory.base}"/local",
                staging: ${drill.exec.udf.directory.base}"/staging",
                registry: ${drill.exec.udf.directory.base}"/registry",
                tmp: ${drill.exec.udf.directory.base}"/tmp"
                }
   }

As You see, I use hdfs for UDF in that scenario.
When I put jar files into 'staging' folder, and run 'CREATE FUNCTION USING JAR' - it registers function successfully. BUT then I can use it only on drillbit where I registered it.
For example if I ran command in web UI in vm29 - I can use function only in vm29.
If in additional, I try to register jar in different drillbit - I get 'already registered' error - but can not use it.(not found error) JAR files present in hdfs://vm29.local:9000/drill/udf/registry and metadata in ZK registry.

Scenario 2
Config the same, only with difference - all drillbits use their local filesystem for UDF folder.

In that case - I can register/unregister function - but I can not use it on every drillbit (not found error). Jar files present in /UDF/registry folder, and metadata in zk registry - but do not work.

What am I doing wrong?
I can not found any description of step-by-step instruction, about using Dynamic UDF feature in cluster. Maybe You know one?

Thanks.

updated:

I just thought: I use web console for queries. Maybe it has difference - create function through web console or jdbc:zk connection? (I will test)

Cause & Results
This is a bug in drill 1.14
Was reported in Drill Jira
Fix with explanation: Drill GitHub repository


Solution

  • This is a regression since 1.13, we have opened a Jira ticket - https://issues.apache.org/jira/browse/DRILL-6762. Meanwhile, you can add custom udfs manually - https://drill.apache.org/docs/manually-adding-custom-functions-to-drill/.