I have an HDFS directory structure like this:
/home/date_1/A/file.txt
/home/date_1/B/file.txt
/home/date_2/A/file.txt
/home/date_2/B/file.txt
...
I can create an external table
CREATE EXTERNAL TABLE table_name(col1 int, col2 string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
STORE AS TEXTFILE
LOCATION '/home'
But I don't know how to introduce the folder information 'A' or 'B' into the table. What can I do? Thanks!
In Hive you have virtual columns which you can use to read the underlying filename. INPUT__FILE__NAME
will give your the list of files that the data has used to get the filename.
So you need to first create external table (as you have done). Then when you query the external table you can make use of the virtual column and split the data, as below:
select
col1,
col2,
INPUT__FILE__NAME as full_filepath,
concat_ws("/",reverse(split(reverse(INPUT__FILE__NAME),"/")[1]), reverse(split(reverse(INPUT__FILE__NAME),"/")[0])) as splitted_filepath
FROM
table_name;
More on virtual column in hive.