I have file on HDFS with 78 GB size
I need to create an Impala External table over it to perform some grouping and aggregation on data available
Problem The file contain headers.
Question Is there any way to skip headers from file while reading the file and do querying on the rest of data.
Although i have a way to solve the problem by copying file to local then remove the headers and then copy the updated file to HDFS again but that is not feasible as the file size is too large
Please suggest if anyone have any idea...
Any suggestions will be appreciated....
Thanks in advance
UPDATE or DELETE row operations are not available in Hive/Impala. So you should simulate DELETE as