Search code examples
hiveexternal

hive external table location vs load path


By going through the internet about external tables and managed table, I understood that we need to specify the Location while creating the external table as hive will create the tables in the given location but in case of managed table, the default directory mentioned in hive.metastore.warehouse.dir will be used. Please correct me if anything wrongly stated.

What confusing me is:

  1. Is the LOCATION clause used to specify where the data exist for External table or where to create the directory to store the actual data?
  2. If the LOCATION clause is used to specify where the data exist, then why are we using the PATH clause in the LOAD statement.

Solution

    1. The location clause in the DDL of an external table is used to specify the hdfs location where the data needs to be stored. Later on when we query the table the data would be read from this specified path.

    2. The load data inpath is the path of the source file from where the data is loaded into the table. The source could be either a local file path or a hdfs file path.

    Hope I have cleared your confusion.