I have defined a table as such:
create external table PageViews (Userid string, Page_View string)
partitioned by (ds string)
row format as delimited fields terminated by ','
stored as textfile location '/user/data';
I do not want all the files in the /user/data directory to be used as part of the table. Is it possible for me to do the following?
location 'user/data/*.csv'
I came across this thread when I had a similar problem to solve. I was able to resolve it by using a custom SerDe. I then added SerDe properties which guided what RegEx to apply to the file name patterns for any particular table.
A custom SerDe might seem overkill if you are only dealing with standard CSV files, I had a more complex file format to deal with. Still this is a very viable solution if you don't shy away from writing some Java. It is particularly useful when you are unable to restructure the data in your storage location and you are looking for a very specific file pattern among a disproportionately large file set.
> CREATE EXTERNAL TABLE PageViews (Userid string, Page_View string)
> ROW FORMAT SERDE 'com.something.MySimpleSerDe'
> WITH SERDEPROPERTIES ( "input.regex" = "*.csv")
> LOCATION '/user/data';