Search code examples
hadoophiveexternal-tableshiveddl

Unable to understand significance of external keyword in hive


I have a few doubts which I need to be clarified:

  1. If I create a table without the "External" keyword, but specify "location", will it be an external or internal table in the hive?
  2. If I use the "external" keyword with a table name but do not specify the 'location', it will be saved to hive/warehouse location which is the default storage. In this case, will it be an external table?

Overall, I want to understand, what makes a table external, the keyword "External" or specifying the 'location'. Any help will be appreciated.


Solution

  • If I create a table without the "External" keyword, but specify "location", will it be an external or internal table in the hive?

    It will be MANAGED table (EXTERNAL=False). You can check it using DESCRIBE FORMATTED tablename;

    If I use the "external" keyword with a table name but do not specify the 'location', it will be saved to hive/warehouse location which is the default storage. In this case, will it be an external table?

    Yes, it will be EXTERNAL table.

    what makes a table external, the keyword "External" or specifying the 'location'

    Only EXTERNAL property / keyword in CREATE TABLE makes EXTERNAL TABLE, not location. EXTERNAL table property is not about location initially. EXTERNAL or not EXTERNAL(MANAGED) defines how DROP TABLE behaves: for EXTERNAL table DROP TABLE will not remove table location, only table metadata will be deleted. For managed table DROP TABLE will remove it's location with all data files as well as metadata. There are also differences in features supported for managed and external tables

    In earlier versions of Hive it was no constraints on where the managed or external tables should be located and it was possible to create MANAGED table outside hive.metastore.warehouse.dir. If LOCATION is not specified, hive will use the value of hive.metastore.warehouse.dir for both managed and external tables. And you can create both managed and external tables on top of the same location: https://stackoverflow.com/a/54038932/2700344.

    See also https://stackoverflow.com/a/56957960/2700344 and https://stackoverflow.com/a/67073849/2700344