Search code examples
apache-sparkhiveapache-spark-sqlbeelinehive-query

drop table command is not deleting path of hive table which was created by spark-sql


I am trying to drop a table(Internal) table that was created Spark-Sql, some how table is getting dropped but location of the table is still exists. Can some one let me know how to do this?

I tried both Beeline and Spark-Sql

    create table something(hello string)
    PARTITIONED BY(date_d string)
    ROW FORMAT DELIMITED FIELDS TERMINATED BY "^"
    LOCATION "hdfs://path"
    )
    Drop table something;
No rows affected (0.945 seconds)

Thanks


Solution

  • Spark internally uses Hive metastore to create Table. If the table is created as an external hive table from spark i.e. the data present in HDFS and Hive provides a table view on that, drop table command will only delete the Metastore information and will not delete the data from HDFS.

    So there are some alternate strategy which you could take

    1. Manually delete the data from HDFS using hadoop fs -rm -rf command
    2. Do alter table on the table you want to delete, change the external table to internal table then drop the table.

      ALTER TABLE <table-name> SET TBLPROPERTIES('external'='false');

      drop table <table-name>;

    The first statement will convert the external table to internal table and 2nd statement will delete the table with the data.