This is an extension of a previous question I asked: Is it possible to change an existing column's metadata on an EXTERNAL table that is defined by an AVRO schema file?
Question: In Hive 2.1.1, how do I INSERT data FROM a PARTITIONED table INTO a PARTITIONED table? What is the correct syntax? I have seen material all over the internet and none of it seems to be working.
Frustration: I have posted way too many questions on the same topic: how to change data from an existing STRING column into a BIGINT column on a table that is EXTERNAL and created by an AVRO metadata file and stored as AVRO. None of that seems to work. So, now I have created a duplicate *_new table with the updated metadata and I am now trying to INSERT the existing data INTO the new table by SELECTing from the existing table. And, that is not working. I have tried numerous permutations of the HQL to perform this task and have received a corresponding permutation of errors.
HQL seems to require a PHD in rocket science... This simple task should not be so f'ng difficult.
Sample query:
INSERT INTO TableName_New
--PARTITION (partition_year="2000", partition_month="01", partition_date="2000-01-01")
PARTITION (partition_year, partition_month, partition_date)
SELECT Column1, Column2
--,CAST(Column3 AS BIGINT) Column3
,Column3
,partition_year, partition_month, partition_date
--,partition_year, partition_month, partition_date
FROM TableName
WHERE partition_year="2000"
AND partition_month="01"
AND partition_date="2000-01-01"
Typical Error Message:
Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
Update:
The error appears to be in the SELECT statement. Now, I can SELECT * no problem. But, when I SELECT either by a specific column or using a WHERE constraint, then I was getting the error above in HUE. I decided to run the same command in HIVE CLI and I think I may have gotten the underlying error:
Excerpted from below:
org.apache.avro.AvroTypeException: Found long, expecting union
Now, what strikes me as odd here is that I did DROP and CREATED the new table using the modified AVRO metadata file and I migrated a single PARTITION (contains 3 files). I verified that both the AVRO metadata file and the PARTITION file have the same metadata for Column3. But, in HUE, the column's metadata shows as BIGINT. It appears that Hive's metastore is not correctly up-to-date (I suspect this is from all of the testing and troubleshooting we have done). How can I correct this?
Regardless, I decided to go ahead and create a whole new table using the old metadata and copied the partition files over within HDFS CLI. In HUE, Column3's metadata now shows correctly as STRING. I then added the partition to the table. I can SELECT * no problem, but I am still getting the same excerpted error above when I try to SELECT either by column or WHERE constraint. I am wondering if column3's metadata was updated for all of the rows within the partition file while the AVRO metadata contained at the top of the partition file was not changed. I'm kind of stuck right now and am open to ideas.
Problem 1: How can I fix the original table's metadata in Hive, considering the AVRO file is correct?
Problem 2: How do I fix the inability to SELECT from the old temp table if the partition files were somehow modified when I ran the ALTER COLUMN ... PARTITION (...) CHANGE COLUMN Column3 Column3 BIGINT CASCADE
command? Do I just run the same command but with STRING instead of BIGINT?
**Full Error Message: **
Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable org.apache.hadoop.hive.serde2.avro.AvroGenericRecordWritable@439b15f2 at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:169) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1731) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable org.apache.hadoop.hive.serde2.avro.AvroGenericRecordWritable@439b15f2 at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:492) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:160) ... 8 more Caused by: org.apache.avro.AvroTypeException: Found long, expecting union at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:292) at org.apache.avro.io.parsing.Parser.advance(Parser.java:88) at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:267) at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153) at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:232) at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:222) at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:175) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:145) at org.apache.hadoop.hive.serde2.avro.AvroDeserializer$SchemaReEncoder.reencode(AvroDeserializer.java:110) at org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserialize(AvroDeserializer.java:174) at org.apache.hadoop.hive.serde2.avro.AvroSerDe.deserialize(AvroSerDe.java:220) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.readRow(MapOperator.java:125) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.access$200(MapOperator.java:89) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:483) ... 9 more
There are few things you can try
I assume column 3 is of bigint type in your new table and string in your old table you can cast and use colaese on it something like colaese (cast(col3 as bigint),0) as col3 in your select statement try doing same thing on all your type casted column
Try insert overwrite
If you are able to query the select part then there must be problem in insert part Please comment you updates lets figure it out