Search code examples
apache-drill

Drill "VALIDATION ERROR: A table or view with given name already exists in schema" for empty directory


After upgrading drill on our cluster to drill-1.12.0-mapr, testing our daily ETL scripts (which all use drill for converting parquet files to tsv), a validation error ("table or view with given name already exists") is always thrown when trying to run a CREATE TABLE statement on some empty directories in a writable workspace.

[Error Id: 6ea46737-8b6a-4887-a671-4bddbea02476 on mapr002.ucera.local:31010]
at org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:489)
at org.apache.drill.jdbc.impl.DrillCursor.loadInitialSchema(DrillCursor.java:561)
:
:
:
Caused by: org.apache.drill.common.exceptions.UserRemoteException: VALIDATION ERROR: A table or view with given name [/internal_etl/project/version-2/stages/storage/ACCOUNT/tsv] already exists in schema [dfs.etl_internal]

After some brief debugging, I see that the FS directory in question under the specified dfs.etl_interal workspace (ie. /internal_etl/project/version-2/stages/storage/ACCOUNT/tsv) is in fact empty, yet still throwing these errors.

Looking for the error ID in the drillbit.log file in the associated node in the error message above, we see

2018-12-04 10:13:25,285 [23f92019-db56-862f-e7b9-cd51b3e174ae:foreman] INFO  o.a.drill.exec.work.foreman.Foreman - Query text for query id 23f92019-db56-862f-e7b9-cd51b3e174ae: create table dfs.etl_internal.`/internal_etl/project/version-2/stages/storage/ACCOUNT/tsv` as 
select <a bunch of fields>
from dfs.etl_internal.`/internal_etl/project/version-2/stages/storage/ACCOUNT/parquet`
2018-12-04 10:13:25,406 [23f92019-db56-862f-e7b9-cd51b3e174ae:foreman] INFO  o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses() took 0 ms, numFiles: 1
2018-12-04 10:13:25,408 [23f92019-db56-862f-e7b9-cd51b3e174ae:foreman] INFO  o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses() took 0 ms, numFiles: 1
2018-12-04 10:13:25,893 [23f92019-db56-862f-e7b9-cd51b3e174ae:foreman] INFO  o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses() took 0 ms, numFiles: 1
2018-12-04 10:13:25,894 [23f92019-db56-862f-e7b9-cd51b3e174ae:foreman] INFO  o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses() took 0 ms, numFiles: 1
2018-12-04 10:13:25,898 [23f92019-db56-862f-e7b9-cd51b3e174ae:foreman] INFO  o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses() took 0 ms, numFiles: 1
2018-12-04 10:13:25,898 [23f92019-db56-862f-e7b9-cd51b3e174ae:foreman] INFO  o.a.d.exec.store.dfs.FileSelection - FileSelection.getStatuses() took 0 ms, numFiles: 1
2018-12-04 10:13:25,905 [23f92019-db56-862f-e7b9-cd51b3e174ae:foreman] INFO  o.a.d.e.p.s.h.CreateTableHandler - User Error Occurred: A table or view with given name [/internal_etl/project/version-2/stages/storage/ACCOUNT/tsv] already exists in schema [dfs.etl_internal]
org.apache.drill.common.exceptions.UserException: VALIDATION ERROR: A table or view with given name [/internal_etl/project/version-2/stages/storage/ACCOUNT/tsv] already exists in schema [dfs.etl_internal]


[Error Id: 45177abc-7e9f-4678-959f-f9e0e38bc564 ]
    at org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:586) ~[drill-common-1.12.0-mapr.jar:1.12.0-mapr]
    at org.apache.drill.exec.planner.sql.handlers.CreateTableHandler.checkTableCreationPossibility(CreateTableHandler.java:326) [drill-java-exec-1.12.0-mapr.jar:1.12.0-mapr]
    at org.apache.drill.exec.planner.sql.handlers.CreateTableHandler.getPlan(CreateTableHandler.java:90) [drill-java-exec-1.12.0-mapr.jar:1.12.0-mapr]
    at org.apache.drill.exec.planner.sql.DrillSqlWorker.getQueryPlan(DrillSqlWorker.java:131) [drill-java-exec-1.12.0-mapr.jar:1.12.0-mapr]
    at org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:79) [drill-java-exec-1.12.0-mapr.jar:1.12.0-mapr]
    at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:567) [drill-java-exec-1.12.0-mapr.jar:1.12.0-mapr]
    at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:264) [drill-java-exec-1.12.0-mapr.jar:1.12.0-mapr]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_151]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_151]
    at java.lang.Thread.run(Thread.java:748) [na:1.8.0_151]
2018-12-04 10:13:25,924 [23f92019-db56-862f-e7b9-cd51b3e174ae:foreman] INFO  o.apache.drill.exec.work.WorkManager - Waiting for 0 queries to complete before shutting down
2018-12-04 10:13:25,924 [23f92019-db56-862f-e7b9-cd51b3e174ae:foreman] INFO  o.apache.drill.exec.work.WorkManager - Waiting for 0 running fragments to complete before shutting down

This error occurs even when using DROP TABLE [IF EXISTS] <workspace>.<table path name> before the CREATE TABLE statement. Furthermore, the configurations for the dfs workspace itself does not appear to be changed from before upgrading to drill-1.12, see below:

:
:
"workspaces": {
"root": {
"location": "/",
"writable": false,
"defaultInputFormat": null,
"allowAccessOutsideWorkspace": false
},
"tmp": {
"location": "/tmp",
"writable": true,
"defaultInputFormat": null,
"allowAccessOutsideWorkspace": false
},
"etl_internal": {
"location": "/etl/internal",
"writable": true,
"defaultInputFormat": null,
"allowAccessOutsideWorkspace": false
}
},
:
:

Note that the full process in question is intended to mv the directory contents every day and CREATE TABLE with new data from current day (in case that makes a difference) and this process had been working fine when we were using drill-1.11.

More debugging information:

Simply deleting the .../tsv endpoint folder and relying on drill to make the directory during the CREATE TABLE statement does not work. Throws the unsurprising error

Error: VALIDATION ERROR: Table [/internal_etl/project/version-2/stages/storage/ACCOUNT/tsv] not found
[Error Id: 02e7c088-9162-4731-9fa8-85dfd39e1dec on mapr001.ucera.local:31010] (state=,code=0)

Ie. drill does not appear to be automatically creating the table. Undoing these changes and rerunning to get the original error, we can examine the location via the sqlline interpreter interface. Doing so, we see

0: jdbc:drill:zk=mapr001:5181,mapr002:5181,ma> describe dfs.etl_internal.`/internal_etl/project/version-2/stages/storage/ACCOUNT/tsv`;
+--------------+------------+--------------+
| COLUMN_NAME  | DATA_TYPE  | IS_NULLABLE  |
+--------------+------------+--------------+
+--------------+------------+--------------+
No rows selected (1.791 seconds)

So it sees something there, but only when I make it myself, which is like a catch-22 given that the original error is complaining that something is already there.

If anyone with more experience using drill knows what could be happening here, any opinions or advice would be appreciated.


Solution

  • Looks like you have made some mistake in the process of updating Drill version on your MapR cluster.

    Please see this doc for more info: http://doc.mapr.com/display/MapR/Upgrading+to+the+Latest+Version+of+Drill
    or the last docs in case you are using the latest MapR Core version: https://mapr.com/docs/home/UpgradeGuide/PreupgradeStepsDrill.html?hl=drill%2Cupgrade
    https://mapr.com/docs/home/UpgradeGuide/PostUpgradeStepsDrill.html?hl=drill%2Cupgrade

    DROP TABLE for Drill schemaless tables works fine. See more info about Drill schemaless tables (empty directories):
    https://drill.apache.org/docs/data-sources-and-file-formats-introduction/#schemaless-tables